Spire.Office Knowledgebase Page 8 | E-iceblue

Convert CSV to JSON Python Tutorial

CSV (Comma-Separated Values) is a widely used format for tabular data. It’s lightweight, easy to generate, and common in reports, logs, exports, and data feeds. However, modern web applications, APIs, and NoSQL databases prefer JSON for its hierarchical structure, flexibility, and compatibility with JavaScript.

Converting CSV to JSON in Python is a practical skill for developers who need to:

  • Prepare CSV data for APIs and web services
  • Migrate CSV exports into NoSQL databases like MongoDB
  • Transform flat CSV tables into nested JSON objects
  • Enable data exchange between systems that require hierarchical formats

This step-by-step tutorial shows you how to convert CSV files to JSON in Python, including flat JSON, nested JSON, JSON with grouped data, and JSON Lines (NDJSON). By the end, you’ll be able to transform raw CSV datasets into well-structured JSON ready for APIs, applications, or data pipelines.

Table of Contents

Why Convert CSV to JSON?

CSV files are lightweight and tabular, but they lack hierarchy. JSON allows structured, nested data ideal for APIs and applications. Converting CSV to JSON enables:

  • API Integration: Most APIs prefer JSON over CSV
  • Flexible Data Structures: JSON supports nested objects
  • Web Development: JSON works natively with JavaScript
  • Database Migration: NoSQL and cloud databases often require JSON
  • Automation: Python scripts can process JSON efficiently

Python CSV to JSON Converter – Installation

To convert CSV files to JSON in Python, this tutorial uses Spire.XLS for Python to read CSV files and Python’s built-in json module to handle JSON conversion.

Why Spire.XLS?

It simplifies working with CSV files by allowing you to:

  • Load CSV files into a workbook structure for easy access to rows and columns
  • Extract and manipulate data efficiently, cell by cell
  • Convert CSV to JSON in flat, nested, or NDJSON formats
  • Export CSV to Excel, PDF, and other formats if needed

Install Spire.XLS

You can install the library directly from PyPI using pip:

pip install spire.xls

If you need detailed guidance on the installation, refer to this tutorial: How to Install Spire.XLS for Python on Windows.

Once installed, you’re ready to convert CSV data into different JSON formats.

Convert CSV to Flat JSON in Python

Converting a CSV file to flat JSON turns each row into a separate JSON object and uses the first row as keys, making the data organized and easy to work with.

Steps to Convert CSV to Flat JSON

  • Load the CSV file into a workbook using Workbook.LoadFromFile.
  • Select the worksheet.
  • Extract headers from the first row.
  • Iterate through each subsequent row to map values to headers.
  • Append each row dictionary to a list.
  • Write the list to a JSON file using json.dump.

Code Example

from spire.xls import *
import json

# Load the CSV file into a workbook object
workbook = Workbook()
workbook.LoadFromFile("employee.csv", ",")

# Select the desired worksheet
sheet = workbook.Worksheets[0]

# Extract headers from the first row
headers = [sheet.Range[1, j].Text for j in range(1, sheet.LastColumn + 1)]

# Map the subsequent CSV rows to JSON objects
data = []
for i in range(2, sheet.LastRow + 1):
    row = {headers[j-1]: sheet.Range[i, j].Text for j in range(1, sheet.LastColumn + 1)}
    data.append(row)

# Write JSON to file
with open("output_flat.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=4, ensure_ascii=False)

# Clean up resources
workbook.Dispose()

Output JSON

Python CSV to Flat JSON example

Convert CSV to Nested JSON in Python

When a single CSV row contains related columns, you can combine these columns into nested JSON objects. For example, merging the Street and City columns into an Address object. Each CSV row produces one JSON object, which can include one or more nested dictionaries. This approach is ideal for scenarios requiring hierarchical data within a single record, such as API responses or application configurations.

Steps to Convert CSV to Nested JSON

  • Load the CSV file and select the worksheet.
  • Decide which columns should form a nested object (e.g., street and city).
  • Iterate over rows and construct each JSON object with a sub-object for nested fields.
  • Append each nested object to a list.
  • Write the list to a JSON file with json.dump.

Code Example

from spire.xls import *
import json

# Create a Workbook instance and load the CSV file (using comma as the delimiter)
workbook = Workbook()
workbook.LoadFromFile("data.csv", ",")

# Get the first worksheet from the workbook
sheet = workbook.Worksheets[0]

# List to store the converted JSON data
data = []

# Loop through rows starting from the second row (assuming the first row contains headers)
for i in range(2, sheet.LastRow + 1):
    # Map each row into a JSON object, including a nested "Address" object
    row = {
        "ID": sheet.Range[i, 1].Text,         # Column 1: ID
        "Name": sheet.Range[i, 2].Text,       # Column 2: Name
        "Address": {                          # Nested object for address
            "Street": sheet.Range[i, 3].Text, # Column 3: Street
            "City": sheet.Range[i, 4].Text    # Column 4: City
        }
    }
    # Add the JSON object to the list
    data.append(row)

# Write the JSON data to a file with indentation for readability
with open("output_nested.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=4, ensure_ascii=False)

# Release resources used by the workbook
workbook.Dispose()

Output Nested JSON

Python CSV to Nested JSON example

Convert CSV to JSON with Grouped Data

When multiple CSV rows belong to the same parent entity, you can group these rows under a single parent object. For example, an order with multiple items can store all items in an items array under one order object. Each parent object has a unique key (like order_id), and its child rows are aggregated into an array. This method is useful for e-commerce orders, data pipelines, or any scenario requiring grouped hierarchical data across multiple rows.

Steps to Convert CSV to JSON with Grouped Data

  • Use defaultdict to group rows by a parent key (order_id).
  • Iterate rows and append child items to the parent object.
  • Convert the grouped dictionary to a list of objects.
  • Write the JSON file.

Code Example

from collections import defaultdict
from spire.xls import *
import json

# Create a Workbook instance and load the CSV file (comma-separated)
workbook = Workbook()
workbook.LoadFromFile("orders.csv", ",")

# Get the first worksheet from the workbook
sheet = workbook.Worksheets[0]

# Use defaultdict to store grouped data
# Each order_id maps to a dictionary with customer name and a list of items
data = defaultdict(lambda: {"customer": "", "items": []})

# Loop through rows starting from the second row (skip header row)
for i in range(2, sheet.LastRow + 1):
    order_id = sheet.Range[i, 1].Text      # Column 1: Order ID
    customer = sheet.Range[i, 2].Text      # Column 2: Customer
    item = sheet.Range[i, 3].Text          # Column 3: Item
    
    # Assign customer name (same for all rows with the same order_id)
    data[order_id]["customer"] = customer
    # Append item to the order's item list
    data[order_id]["items"].append(item)

# Convert the grouped dictionary into a list of objects
# Each object contains order_id, customer, and items
result = [{"order_id": oid, **details} for oid, details in data.items()]

# Write the grouped data to a JSON file with indentation for readability
with open("output_grouped.json", "w", encoding="utf-8") as f:
    json.dump(result, f, indent=4, ensure_ascii=False)

# Release resources used by the workbook
workbook.Dispose()

Output JSON with Grouped Data

Python CSV to Grouped JSON example

If you're also interested in saving JSON back to CSV, follow our guide on converting JSON to CSV in Python.

Convert CSV to JSON Lines (NDJSON) in Python

JSON Lines (also called NDJSON – Newline Delimited JSON) is a format where each line is a separate JSON object. It is ideal for large datasets, streaming, and big data pipelines.

Why use NDJSON?

  • Streaming-friendly: Process one record at a time without loading the entire file into memory.
  • Big data compatibility: Tools like Elasticsearch, Logstash, and Hadoop natively support NDJSON.
  • Error isolation: If one line is corrupted, the rest of the file remains valid.

Code Example

from spire.xls import *
import json

# Create a Workbook instance and load the CSV file (comma-separated)
workbook = Workbook()
workbook.LoadFromFile("employee.csv", ",")

# Get the first worksheet from the workbook
sheet = workbook.Worksheets[0]

# Extract headers from the first row to use as JSON keys
headers = [sheet.Range[1, j].Text for j in range(1, sheet.LastColumn + 1)]

# Open a file to write JSON Lines (NDJSON) format
with open("output.ndjson", "w", encoding="utf-8") as f:
    # Loop through each row in the worksheet, starting from the second row
    for i in range(2, sheet.LastRow + 1):
        # Map each header to its corresponding cell value for the current row
        row = {headers[j - 1]: sheet.Range[i, j].Text for j in range(1, sheet.LastColumn + 1)}
        
        # Write the JSON object to the file followed by a newline
        # Each line is a separate JSON object (NDJSON format)
        f.write(json.dumps(row, ensure_ascii=False) + "\n")

# Release resources used by the workbook
workbook.Dispose()

Output NDJSON

Python CSV to NDJSON example

Handle Large CSV Files to JSON Conversion

For large CSV files, it’s not always efficient to load everything into memory at once. With Spire.XLS, you can still load the file as a worksheet, but instead of appending everything into a list, you can process rows in chunks and write them to JSON incrementally. This technique minimizes memory usage, making it suitable for big CSV to JSON conversion in Python.

Code Example

from spire.xls import *
import json

# Create a Workbook instance and load the CSV file (comma-separated)
workbook = Workbook()
workbook.LoadFromFile("large.csv", ",")

# Get the first worksheet from the workbook
sheet = workbook.Worksheets[0]

# Open a JSON file for writing, with UTF-8 encoding
with open("large.json", "w", encoding="utf-8") as json_file:
    json_file.write("[\n")  # Start the JSON array
    first = True  # Flag to handle commas between JSON objects

    # Loop through each row in the worksheet, starting from the second row
    # (skip the header row)
    for i in range(2, sheet.LastRow + 1):
        # Create a dictionary mapping each header to its corresponding cell value
        row = {sheet.Range[1, j].Text: sheet.Range[i, j].Text
               for j in range(1, sheet.LastColumn + 1)}

        # Add a comma before the object if it is not the first row
        if not first:
            json_file.write(",\n")

        # Write the JSON object to the file
        json.dump(row, json_file, ensure_ascii=False)
        first = False  # After the first row, set the flag to False

    json_file.write("\n]")  # End the JSON array

# Release resources used by the workbook
workbook.Dispose()

Best Practices for CSV to JSON Conversion

When converting CSV to JSON in Python, follow these best practices can ensure data integrity and compatibility:

  • Always Use CSV headers as JSON keys.
  • Handle missing values with null or default values.
  • Normalize data types (convert numeric strings to integers or floats).
  • Use UTF-8 encoding for JSON files.
  • Stream large CSV files row by row to reduce memory usage.
  • Validate JSON structure after writing, especially for nested JSON.

Conclusion

Converting CSV to JSON in Python helps you work with data more efficiently and adapt it for modern applications. Using Python and libraries like Spire.XLS for Python, you can:

  • Convert flat CSV files into structured JSON objects.
  • Organize related CSV data into nested JSON structures.
  • Group multiple CSV rows into coherent JSON objects for analysis or APIs.
  • Create JSON Lines (NDJSON) for large datasets or streaming scenarios.
  • Process large CSV files efficiently without loading everything into memory.

These approaches let you handle CSV data in a way that fits your workflow, making it easier to prepare, share, and analyze data for APIs, applications, or big data pipelines.

FAQs

Q1: How do I convert CSV to JSON with headers in Python?

A1: If your CSV has headers, use the first row as keys and map subsequent rows to dictionaries. With Spire.XLS, you can access sheet.Range[1, j].Text for headers.

Q2: How do I convert CSV to nested JSON in Python?

A2: Identify related columns (e.g., Street and City) and group them into a sub-object when building JSON. See the Nested JSON example above.

Q3: What’s the best way to handle large CSV files when converting to JSON?

A3: Use a streaming approach where each row is processed and written to JSON immediately, instead of storing everything in memory.

Q4: Can Spire.XLS handle CSV files with different delimiters?

A4: Yes, when loading the CSV with Spire.XLS’s LoadFromFile method, specify the delimiter (e.g., "," or ";").

Q5: How to convert JSON back to CSV in Python?

A5: Use Python’s json module to read the JSON file into a list of dictionaries, then write it back to CSV using Spire.XLS for Python for advanced formatting and export options.

Q6: How to convert CSV to JSON Lines (NDJSON) in Python?

A6: JSON Lines (NDJSON) writes each JSON object on a separate line. Stream each CSV row to the output file line by line, which is memory-efficient and compatible with big data pipelines like Elasticsearch or Logstash.

Tutorial on How to Write Data to Excel Worksheets Using Python Code

Excel remains one of the most widely used tools for organizing, analyzing, and presenting data. From financial reports to operational dashboards, many workflows require exporting data into Excel for better readability and sharing. Instead of manually entering information, automating Excel file writing with Python makes it faster, more reliable, and more scalable.

This tutorial explains how to write data to Excel files with Python, covering structured data insertion, formatting, and exporting. The examples use a Python Excel library that allows programmatic creation and customization of workbooks.

What's Included in This Tutorial:

  1. Setting Up the Environment
  2. Writing Data into Excel Files
  3. Formatting While Writing
  4. Working with Multiple Worksheets
  5. Best Practices
  6. Conclusion
  7. FAQ

Setting Up the Environment

Before writing Excel files in Python, you need a library that supports creating, loading, and saving workbooks programmatically. Spire.XLS for Python provides a complete API for these operations, enabling automated report generation and data processing.

Install the package using pip:

pip install spire.xls

Once installed, you can handle Excel files using three core operations:

  • Creating a new workbook – initialize a new Excel document with Workbook().
  • Loading an existing workbook – open an existing Excel file using LoadFromFile().
  • Saving a workbook – export the workbook to the desired format with SaveToFile(), supporting .xlsx, .xls, CSV, and more.

These operations form the foundation for further data writing, formatting, and multi-sheet management in Python.


Writing Data into Excel Files with Python

In real-world business scenarios, you may need to create new Excel files, update existing reports, or write different types of data—such as text, numbers, dates, and formulas. This section demonstrates how to efficiently write and manage data in Excel files with Python across these common use cases.

Appending Data to an Existing Excel File

When you need to update an existing Excel workbook with new information—such as adding recent sales records, inventory updates, or additional data rows—you can open the file, append the data programmatically, and save it without overwriting existing content:

from spire.xls import Workbook, ExcelVersion

workbook = Workbook()
workbook.LoadFromFile("Sample.xlsx")
sheet = workbook.Worksheets[0]

# Add new rows
sheet.Range["A4"].Value = "Laptop"
sheet.Range["B4"].NumberValue = 5
sheet.Range["C4"].NumberValue = 1200.00

sheet.Range["A5"].Value = "Monitor"
sheet.Range["B5"].NumberValue = 10
sheet.Range["C5"].NumberValue = 300.00

workbook.SaveToFile("output/updated_excel.xlsx", ExcelVersion.Version2016)

Key elements used:

  • LoadFromFile() – loads an existing Excel file into the workbook object.
  • Range["CellName"] – references a specific cell in the sheet using its name.
  • Value / NumberValue – assigns text or numeric data to cells.
  • SaveToFile() – saves the workbook to a file in the specified Excel format.

This method allows continuous updates to reports while preserving existing content.

Example showing appended data:

Append rows to an existing Excel file in Python

Writing Multiple Rows and Columns to a New Excel File

When dealing with larger datasets, writing multiple rows and columns at once is much more efficient than updating individual cells one by one. This approach not only saves time but also ensures consistent data insertion across the worksheet:

from spire.xls import Workbook, ExcelVersion

# Create a new Excel workbook
workbook = Workbook()
sheet = workbook.Worksheets[0]

orders = [
    ["OrderID", "Customer", "Product", "Quantity", "Price", "Status"],
    [1001, "Alice", "Laptop", 2, 1200.00, "Shipped"],
    [1002, "Bob", "Monitor", 1, 300.00, "Pending"],
    [1003, "Charlie", "Keyboard", 5, 45.00, "Delivered"],
    [1004, "David", "Mouse", 3, 25.00, "Shipped"],
    [1005, "Eva", "Tablet", 1, 450.00, "Pending"]
]

for row_index, row_data in enumerate(orders, start=1):
    for col_index, value in enumerate(row_data, start=1):
        if isinstance(value, (int, float)):
            sheet.Range[row_index, col_index].NumberValue = value
        else:
            sheet.Range[row_index, col_index].Value = value

workbook.SaveToFile("output/orders.xlsx", ExcelVersion.Version2016)

Important elements in this example:

  • enumerate() – provides row and column indices for looping.
  • Range[row, col] – references a cell in the worksheet by its row and column indexes.

Batch writing ensures efficiency, especially when exporting database query results or operational reports.

Example showing batch data insertion:

Write multiple rows and columns into Excel using Python

Writing Different Data Types

Excel cells can contain various types of data, such as text, numbers, dates, formulas, and more. Using the correct properties and methods ensures that each type is stored and displayed appropriately, allowing accurate calculations and proper formatting:

from spire.xls import Workbook, ExcelVersion, DateTime, TimeSpan

workbook = Workbook()
sheet = workbook.Worksheets[0]

# Set general value
sheet.Range[2, 2].Text = "General Example"
sheet.Range[2, 3].Value = "General 123"

# Set number value
sheet.Range[3, 2].Text = "Number Example"
sheet.Range[3, 3].NumberValue = 1234.56
sheet.Range[3, 3].NumberFormat = "0.000"

# Set datetime value
sheet.Range[4, 2].Text = "Date Example"
sheet.Range[4, 3].DateTimeValue = DateTime.get_UtcNow()

# Set formula value
sheet.Range[5, 2].Text = "Formula Example"
sheet.Range[5, 5].NumberValue = 1234.56
sheet.Range[5, 6].NumberValue = 6543.21
sheet.Range[5, 3].Formula = "=SUM(E5:F5)"

# Set text
sheet.Range[6, 2].Text = "Text Example"
sheet.Range[6, 3].Text = "Text"

# Set boolean value
sheet.Range[7, 2].Text = "Boolean Example"
sheet.Range[7, 3].BooleanValue = True

sheet.AllocatedRange.AutoFitColumns()

workbook.SaveToFile("output/value_types.xlsx", ExcelVersion.Version2016)

Key functions and properties used:

  • Value – assigns or retrieves the general value of a cell, suitable for text or mixed content.
  • NumberValue – specifically handles numeric values in a cell, ensuring proper number formatting and calculations.
  • DateTimeValue – used to input or obtain date and time values in a cell with correct formatting.
  • Formula – sets or retrieves the formula expression in a cell to perform dynamic calculations.
  • BooleanValue – stores or returns a Boolean (True/False) value in a cell.
  • Text – retrieves the displayed text of a cell, including any applied formatting.

Proper handling of different data types is essential for accurate business calculations and reporting. For more details on supported data types, see the XlsRange API reference.

Example showing mixed data types:

Write text, numbers, dates, and formulas into Excel using Python


Formatting Excel While Writing Data with Python

To make Excel reports clear and professional, it’s important to apply formatting while entering or updating data. This section demonstrates how to enhance readability and presentation by styling cells, setting number formats, and adjusting column widths and row heights as you write data into Excel.

Applying Cell Styles

You can enhance the readability and appearance of your Excel sheet by applying various styles to cells, such as fonts, borders, and background colors:

from spire.xls import Workbook, Color, FontUnderlineType, ExcelVersion

workbook = Workbook()
sheet = workbook.Worksheets[0]
sheet.Range["A1"].Value = "Product"
sheet.Range["B1"].Value = "Category"
sheet.Range["C1"].Value = "Price"
sheet.Range["D1"].Value = "Quantity"
sheet.Range["E1"].Value = "Total"
sheet.Range["A2"].Value = "MacBook Pro"
sheet.Range["B2"].Value = "Laptop"
sheet.Range["C2"].NumberValue = 999.99
sheet.Range["D2"].NumberValue = 1
sheet.Range["E2"].Formula = "=C2*D2"
sheet.Range["A3"].Value = "iPhone 16 Pro"
sheet.Range["B3"].Value = "Smartphone"
sheet.Range["C3"].NumberValue = 899.99
sheet.Range["D3"].NumberValue = 1
sheet.Range["E3"].Formula = "=C3*D3"

# Set header style
header = sheet.Range["A1:E1"]
header.Style.Font.FontName = "Arial"
header.Style.Font.Size = 14.0
header.Style.Font.IsBold = True
header.Style.Font.Underline = FontUnderlineType.Single
header.Style.Interior.Color = Color.get_LightGray()
header.Style.Borders[BordersLineType.EdgeRight].LineStyle = LineStyleType.Medium

Core components demonstrated:

  • Style.Font – controls font-related settings such as bold, underline, and more (full list of supported properties can be found in the Style.Font API documentation).
  • FontUnderlineType.Single – applies a single underline.
  • Interior.Color – fills the cell background with a specified color.
  • Borders.LineStyle – adds borders around cells.

Styled cells enhance readability and emphasize critical sections.

Setting Number Formats for Excel Cells

Numbers in Excel often require specific display formats to improve readability and presentation. Using CellRange.NumberFormat, you can control how numeric values appear, such as applying currency, percentage, or integer formats:

# Apply number formats
sheet.Range["C2:C3"].NumberFormat = "$#,##0.00"   # Currency format
sheet.Range["D2:D3"].NumberFormat = "0"           # Integer format
sheet.Range["E2:E3"].NumberFormat = "$#,##0.00"

Key highlights:

  • NumberFormat – enables reading and setting Excel cell number formats, controlling how numbers are displayed while keeping the raw data intact.
  • Format codes define display rules such as currency symbols, decimal places, or percentage styles, giving you flexibility in presenting numerical data.

With proper number formatting, financial data is easier to interpret and looks more professional. For more details and a full list of format codes, see our dedicated guide on Setting Excel Cell Number Format in Python.

Adjusting Column Widths and Row Heights

Properly adjusting column widths and row heights ensures that all content is clearly visible. You can set them manually or use automatic fitting to match the content:

# Auto-fit column widths and row heights
for col in range(1, 5):
    sheet.AutoFitColumn(col)
for row in range(1, 3):
    sheet.AutoFitRow(row)

# Auto-fit a specific range of cells
#sheet.Range["A1:E3"].AutoFitColumns()
#sheet.Range["A1:E3"].AutoFitRows()

# Set a fixed column width and row height
sheet.Columns[1].Width = 150
sheet.Rows[1].Height = 30

workbook.SaveToFile("output/formatted_excel.xlsx", ExcelVersion.Version2016)

Key highlights:

  • AutoFitColumn(colIndex) / AutoFitRow(rowIndex) – automatically adjust a single column or row to fit its content.
  • CellRange.AutoFitColumns() / AutoFitRows() – automatically adjust all columns or rows within a specified cell range.
  • Columns[colIndex].Width / Rows[rowIndex].Height – manually set a fixed width or height for precise control.

With these options, you can choose between automatic fitting for dynamic data or fixed dimensions for consistent layout, ensuring your Excel worksheets remain both readable and professionally formatted.

Example showing styled and auto-fitted headers:

Apply styles and auto-fit columns in Excel using Python

To explore more advanced techniques for formatting Excel sheets in Python, including fonts, colors, borders, and conditional formatting, check out our dedicated guide on Formatting Excel in Python for detailed instructions.


Managing Multiple Worksheets in Excel with Python

In Excel, organizing data into multiple worksheets helps keep related information separated and easy to manage. For example, you can maintain separate sheets for sales, purchases, inventory, or other categories within the same workbook. This section demonstrates how to create, access, and manage multiple worksheets using Python.

from spire.xls import Workbook, ExcelVersion

workbook = Workbook()

sheet = workbook.Worksheets[0]
sheet.Name = "Sales"

sheet1 = workbook.Worksheets["Sheet2"]
sheet1.Name = "Purchases"

sheet2 = workbook.Worksheets.Add("Inventory")
sheet2.Range["A1"].Value = "ProductID"
sheet2.Range["B1"].Value = "Stock"

workbook.SaveToFile("output/multi_sheet.xlsx", ExcelVersion.Version2016)

Main features highlighted:

  • Worksheets[Index] – access a worksheet by its position in the workbook (useful for iterating over all sheets or referencing the first/last sheet).
  • Worksheets["SheetName"] – access a worksheet by its name, which is more readable and reliable if the sheet order might change.
  • Worksheets.Add("SheetName") – create a new worksheet to organize different categories of data such as departments, sales regions, or product lines.

These methods allow you to structure your Excel file efficiently, keeping related data on separate sheets for clarity and easier management.

Example showing multiple worksheets:

Create and manage multiple worksheets in Excel using Python


Best Practices for Writing Excel Files with Python

When writing Excel files with Python, follow best practices to maintain efficiency, consistency, and usability:

  • Use descriptive sheet names like “Sales_2024” instead of “Sheet1.”
  • Batch write large datasets instead of individual cell updates to improve performance.
  • Apply consistent formatting for headers, totals, and key columns.
  • Leverage Excel formulas to maintain dynamic calculations.
  • Validate data types to prevent misinterpretation in charts or formulas.
  • Choose file formats suited to the audience: .xlsx for modern users, .xls only for legacy compatibility.
  • Organize worksheets logically, grouping related datasets for easy navigation.

Implementing these practices avoids common pitfalls and produces professional, reusable reports.


Conclusion

Automating Excel writing in Python significantly streamlines reporting. By creating workbooks, writing data efficiently, applying styles, managing worksheets, and handling diverse data types, developers can ensure consistent, accurate, and professional Excel reports. To explore the library further, you can request a free temporary license or try the Free Spire.XLS for Python edition.


Python Excel Writing FAQ

Q1: Can Python write to an existing Excel file?

Yes, Python can load an existing workbook, append or modify data, and save it while preserving all previously entered content.

Q2: How to efficiently handle large datasets in Python?

Batch writing multiple rows and minimizing formatting during data insertion helps maintain high performance even with thousands of rows.

Q3: Can formulas be included in Excel files?

Yes, you can insert formulas, including =SUM() and more complex calculations, to keep your Excel reports dynamic and automatically updated.

Q4: Which Excel formats are supported?

Spire.XLS for Python can save files in .xlsx, .xls, CSV, and even export to PDF, covering most common use cases and compatibility needs.

Python Convert CSV to XML

CSV is one of the most common formats for exchanging tabular data because of its simplicity and wide support across platforms. However, when working with structured applications, configurations, or hierarchical data, XML often becomes the preferred choice due to its ability to represent nested relationships and enforce stricter data validation.

In this guide, we’ll explore how to convert CSV files into XML using Spire.XLS for Python. You’ll learn how to convert CSV into both Excel XML format (SpreadsheetML) and standard XML for general use. We’ll also cover how to clean and preprocess real-world CSV files—handling issues like invalid headers, missing values, special characters, and nested fields—so that your XML output is always valid and structured correctly.

Table of Contents:

Why Convert CSV to XML

So why should developers care about converting CSV to XML? Here are some real-world use cases:

  • Enterprise Data Migration: Many enterprise applications like ERP or CRM systems require XML input for bulk data import.
  • Configuration & Metadata: XML is often used to store structured metadata, while raw data may come in CSV form.
  • Interoperability: Some APIs, especially in finance, healthcare, and government sectors, still rely heavily on XML payloads.
  • Readable Reports: XML can represent hierarchical data, making it more descriptive compared to flat CSV files.
  • Validation: XML can be validated against schemas (XSD), ensuring data integrity—something CSV cannot provide out-of-the-box.

CSV is great for simplicity . XML is great for structure . By converting between them, you get the best of both worlds.

Prerequisites

Before diving into the code, let’s ensure we have the right tools:

  • Python 3.7+
  • Spire.XLS for Python → A professional library that provides powerful Excel manipulation capabilities.
  • Standard Python Libraries → xml.etree.ElementTree, csv, and re.

You can install Spire.XLS via pip (assuming you have Python and pip installed on your system):

pip install spire.xls

Additionally, make sure you have a CSV file ready to test. A simple one might look like this:

ID,FirstName,LastName,Age,Department,Salary
1,John,Doe,30,Sales,5000
2,Jane,Smith,28,Marketing,4500
3,Mike,Johnson,35,IT,6000
4,Emily,Brown,32,HR,4800

Convert CSV to Excel XML Format in Python

The first approach is to convert CSV into Excel-compatible XML . This format is sometimes called SpreadsheetML , introduced in Excel 2003. It allows CSV data to be transformed into a fully structured XML file that Excel can open directly.

With Spire.XLS, this process is straightforward:

from spire.xls import *

# Create a Workbook
workbook = Workbook()

# Load CSV into the first worksheet
workbook.LoadFromFile("input.csv", ",", 1, 1)

# Save as XML Spreadsheet format
workbook.SaveAsXml("output.xml")

# Dispose resources
workbook.Dispose()

How it Works

  1. LoadFromFile() – Reads the CSV file into the workbook.
  2. SaveAsXml() – Saves it as SpreadsheetML XML format.

This approach is best if your end goal is Excel compatibility rather than pure XML processing.

Output:

Convert CSV to Excel-compatible XML

You may also like: Convert Excel to Excel-Compatible XML in Python

Convert CSV to Standard XML in Python

Sometimes, you need a custom XML format rather than Excel-compatible XML. For instance, you may need an XML structure like:

<Employees>
  <Employee>
    <ID>1</ID>
    <FirstName>John</FirstName>
    <LastName>Doe</LastName>
    <Department>Sales</Department>
    <Salary>5000</Salary>
  </Employee>
</Employees>

Here’s how you can achieve that:

from spire.xls import *
import xml.etree.ElementTree as ET

# Step 1: Load CSV into workbook
workbook = Workbook()
workbook.LoadFromFile("input.csv", ",", 1, 1)
sheet = workbook.Worksheets[0]

# Step 2: Create root element
root = ET.Element("Root")

# Step 3: Loop through rows and build XML
# Assume first row is header
headers = []
for col in range(1, sheet.Columns.Count + 1):
    cell_value = sheet.Range[1, col].Value
    if not cell_value:
        break
    headers.append(cell_value)

# Add data rows
for row in range(2, sheet.Rows.Count + 1):
    # Stop if row is empty
    if not sheet.Range[row, 1].Value:
        break

    record = ET.SubElement(root, "Record")
    for col, header in enumerate(headers, start=1):
        cell_value = sheet.Range[row, col].Value
        field = ET.SubElement(record, header)
        field.text = cell_value if cell_value else ""

# Step 4: Save to XML file
tree = ET.ElementTree(root)
tree.write("output.xml", encoding="utf-8", xml_declaration=True)

# Cleanup
workbook.Dispose()

Explanation

  • Step 1: Load the CSV into a worksheet.
  • Step 2: Create a root <Root> element for the XML tree.
  • Step 3: Extract headers from the first row and use them as XML tags.
  • Step 4: Loop through rows and add <Record> elements with child fields.
  • Step 5: Save the XML to a file.

This approach is flexible because you can customize the XML schema however you like (e.g., change <Record> to <Employee>).

Output:

Convert CSV to Standard XML

You may also like: How to Convert XML to CSV in Python

Handle Real-World CSV Challenges

Converting a “perfect” CSV to XML is easy. But real-world CSVs are rarely perfect. Let’s discuss common issues developers face and how to fix them.

  1. Headers with Invalid XML Names

    • Problem: CSV headers like Employee ID or 123Name are invalid XML tags since XML element names cannot have spaces, start with numbers, or include special characters.
    • Fix: Clean headers by replacing spaces with underscores (_) or prefixing numeric headers with a safe string.
  2. Empty or Missing Values

    • Problem: Real-world CSVs often have missing cells. When converted to XML, missing cells may create malformed structures.
    • Fix: Replace blanks with placeholders (NULL, Unknown, or 0).
  3. Special Characters in Data

    • Problem: Characters like <, >, & inside CSV values can break XML if not escaped.
    • Fix: Always escape XML entities properly (&lt;, &gt;, &amp;).
  4. Nested / Hierarchical Data in Flat CSV

    • Problem: Some CSV cells contain multiple values or hierarchical information (e.g., lists of products) within a single field. Converting directly to XML will flatten the data, losing the intended hierarchy.

Example:

OrderID,Customer,Products
1001,John,"Laptop;Mouse;Keyboard"

Flattened XML (bad):

<Products>Laptop;Mouse;Keyboard</Products>
  • Fix: Detect list-like or nested values and expand them into child XML elements or multiple rows, preserving the structure in XML.

Hierarchical XML (fixed):

<Products>
  <Product>Laptop</Product>
  <Product>Mouse</Product>
  <Product>Keyboard</Product>
</Products>

Automating Fixes with clean_csv

Instead of cleaning data manually, you can automate preprocessing with the following helper function:

import csv
import re

def clean_csv(input_file, output_file, nested_columns=None, nested_delimiter=";"):
    if nested_columns is None:
        nested_columns = []

    cleaned_rows = []

    # Escape special characters for XML
    def escape_xml(text):
        return (text.replace("&", "&amp;")
                    .replace("<", "&lt;")
                    .replace(">", "&gt;")
                    .replace('"', "&quot;")
                    .replace("'", "&apos;"))

    with open(input_file, "r", encoding="utf-8") as infile:
        reader = csv.reader(infile)
        headers = next(reader)

        # Clean headers
        cleaned_headers = []
        for h in headers:
            h = h.strip()
            h = re.sub(r"\s+", "_", h)          # Replace spaces with _
            h = re.sub(r"[^a-zA-Z0-9_]", "", h) # Remove invalid chars
            if re.match(r"^\d", h):             # Cannot start with digit
                h = "Field_" + h
            cleaned_headers.append(h)

        cleaned_rows.append(cleaned_headers)

        # Read all rows first
        raw_rows = []
        for row in reader:
            # Handle empty cells
            row = [cell if cell.strip() != "" else "NULL" for cell in row]
            raw_rows.append(row)

    # Expand nested columns
    if nested_columns:
        expanded_rows = [cleaned_headers]  # include header
        for row in raw_rows:
            row_variants = [row]
            for col_name in nested_columns:
                if col_name not in cleaned_headers:
                    continue
                col_index = cleaned_headers.index(col_name)
                temp = []
                for variant in row_variants:
                    cell_value = variant[col_index]
                    # Only split on the nested delimiter, not on XML special chars
                    if nested_delimiter in cell_value:
                        items = [item.strip() for item in cell_value.split(nested_delimiter)]
                        for item in items:
                            new_variant = variant.copy()
                            new_variant[col_index] = item
                            temp.append(new_variant)
                    else:
                        temp.append(variant)
                row_variants = temp
            expanded_rows.extend(row_variants)
        cleaned_rows = expanded_rows
    else:
        cleaned_rows.extend(raw_rows)

    # Escape special characters after expansion
    final_rows = [cleaned_rows[0]]  # header
    for row in cleaned_rows[1:]:
        final_row = [escape_xml(cell) for cell in row]
        final_rows.append(final_row)

    # Write cleaned CSV
    with open(output_file, "w", newline="", encoding="utf-8") as outfile:
        writer = csv.writer(outfile)
        writer.writerows(final_rows)

    print(f"Cleaned CSV saved to {output_file}")

You can invoke the clean_csv function by passing the input CSV file path, output CSV file path, and optionally the columns to expand for nested values.

# File paths
input_file = r"C:\Users\Administrator\Desktop\input.csv"
output_file = r"C:\Users\Administrator\Desktop\cleaned_output.csv"

# Specify columns that may contain nested values
nested_columns = ["Products"]  # you can add more, e.g., ["Products", "Reviews"]

# Call the clean_csv function
clean_csv(input_file, output_file, nested_columns=nested_columns, nested_delimiter=";")

This function ensures your CSV is safe and clean before converting it to XML.

It helps:

  • Clean headers (valid XML names)
  • Handle empty cells
  • Expand nested values in any specified column(s)
  • Escape special characters for XML
  • Write a clean, UTF-8 CSV ready for XML conversion

Wrap Up

Converting CSV to XML in Python with Spire.XLS is both powerful and flexible. If you only need Excel-readable XML, saving directly to SpreadsheetML is the quickest option. If you need custom XML structures for APIs, migrations, or reporting, you can build standard XML using xml.etree.ElementTree.

Real-world CSVs are messy. They may have missing values, invalid headers, special characters, or nested lists. By integrating a preprocessing step (like the clean_csv function), you ensure that your XML output is both valid and meaningful.

Whether you’re preparing data for enterprise integration, creating structured reports, or exchanging information with legacy systems, this workflow gives you the best of both worlds: the simplicity of CSV and the structure of XML.

FAQs

Q1: Can I convert very large CSV files to XML?

Yes, but for very large files, you should use a streaming approach (processing rows one by one) to avoid memory issues.

Q2: Does Spire.XLS support direct conversion to XML without custom code?

Yes. Saving as SpreadsheetML is built-in. But for custom XML, you’ll need to write Python code as shown.

Q3: How do I handle special characters automatically?

Use the escape_xml helper or Python’s built-in xml.sax.saxutils.escape().

Q4: What if my CSV has nested data in multiple columns?

You can pass multiple column names into nested_columns when calling clean_csv.

Q5: Can I validate the generated XML?

Yes. After generating XML, validate it against an XSD schema if your system requires strict formatting.

Get a Free License

To fully experience the capabilities of Spire.XLS for Python without any evaluation limitations, you can request a free 30-day trial license.

Page 8 of 334
page 8