Python: Convert Word to Excel

2024-12-12 08:02:46 Written by support iceblue

While Word is a powerful tool for creating and formatting documents, it is not optimized for advanced data management and analysis. In contrast, Excel excels at handling data in tabular form, allowing users to perform calculations, create charts, and conduct thorough data analysis.

Generally, converting complex Word documents into Excel spreadsheets is not advisable, as Excel may struggle to preserve the original layout. However, if your Word document primarily consists of tables, converting it to Excel can be highly beneficial. This transformation unlocks Excel's advanced functions, formulas, and visualization tools, enabling you to organize your data more effectively and improve your reporting and decision-making capabilities. In this article, we will focus specifically on how to convert this kind of Word documents to Excel in Python using Spire.Office for Python.

Install Spire.Office for Python

This scenario requires Spire.Office for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Office

Convert Word to Excel in Python

This process uses two libraries in the Spire.Office for Python package. They're Spire.XLS for Python and Spire.Doc for Python. The former is used to read and extract content from a Word document, and the latter is used to create an Excel document and write data in specific cells. To make this code example easy to understand, we have defined the following three custom methods that handle specific tasks:

  • ExportTableInExcel() - Export data from a Word table to specified Excel cells.
  • CopyContentInTable() - Copy content from a table cell in Word to an Excel cell.
  • CopyTextAndStyle() - Copy text with formatting from a Word paragraph to an Excel cell.

The following steps demonstrate how to export data from an entire Word document to an Excel worksheet using Spire.Office for Python.

  • Create a Document object to load a Word file.
  • Create a Worbbook object and add a worksheet named "WordToExcel" to it.
  • Traverse through all the sections in the Word document and all the document objects under a certain section, and then determine if a document object is a paragraph or a table.
  • If the document object is a paragraph, write the paragraph in a specified cell in Excel using CoypTextAndStyle() method.
  • If the document object is a table, export the table data from Word to Excel cells using ExportTableInExcel() method.
  • Auto fit the row height and column width in Excel so that the data within a cell will not exceed the bound of the cell.
  • Save the workbook to an Excel file using Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.doc import *

# Export data from Word table to Excel cells
def ExportTableInExcel(worksheet, row, table):
    for rowIndex in range(len(table.Rows)):
        tbRow = table.Rows[rowIndex]
        column = 1
        for cellIndex in range(len(tbRow.Cells)):
            tbCell = tbRow.Cells[cellIndex]
            cell = worksheet.Range[row, column]
            cell.BorderAround()
            CopyContentInTable(worksheet, tbCell, cell)
            column += 1
        row += 1
    return row

# Copy content from a Word table cell to an Excel cell
def CopyContentInTable(worksheet, tbCell, cell):
    newPara = Paragraph(tbCell.Document)
    for i in range(len(tbCell.ChildObjects)):
        documentObject = tbCell.ChildObjects[i]
        if isinstance(documentObject, Paragraph):
            paragraph = documentObject
            for cObj in range(len(paragraph.ChildObjects)):
                newPara.ChildObjects.Add(paragraph.ChildObjects[cObj].Clone())
            if i < len(tbCell.ChildObjects) - 1:
                newPara.AppendText("\n")
    CopyTextAndStyle(worksheet, cell, newPara)

# Copy text and style of a paragraph to a cell
def CopyTextAndStyle(worksheet, cell, paragraph):
    richText = cell.RichText
    richText.Text = paragraph.Text
    startIndex = 0
    for documentObject in range(len(paragraph.ChildObjects)):
        documentObject = paragraph.ChildObjects[documentObject]
        if isinstance(documentObject, TextRange):
            textRange = documentObject
            fontName = textRange.CharacterFormat.FontName
            isBold = textRange.CharacterFormat.Bold
            textColor = textRange.CharacterFormat.TextColor
            fontSize = textRange.CharacterFormat.FontSize
            textRangeText = textRange.Text
            strLength = len(textRangeText)
            font = worksheet.Workbook.CreateFont()
            font.Color = textColor
            font.IsBold = isBold
            font.Size = fontSize
            font.FontName = fontName
            endIndex = startIndex + strLength
            richText.SetFont(startIndex, endIndex, font)
            startIndex += strLength
        if isinstance(documentObject, DocPicture):
            picture = documentObject
            worksheet.Pictures.Add(cell.Row, cell.Column, picture.Image)
            worksheet.SetRowHeightInPixels(cell.Row, 1, picture.Image.Height)

    if paragraph.Format.HorizontalAlignment == HorizontalAlignment.Left:
        cell.Style.HorizontalAlignment = HorizontalAlignType.Left
    elif paragraph.Format.HorizontalAlignment == HorizontalAlignment.Center:
        cell.Style.HorizontalAlignment = HorizontalAlignType.Center
    elif paragraph.Format.HorizontalAlignment == HorizontalAlignment.Right:
        cell.Style.HorizontalAlignment = HorizontalAlignType.Right

# Create a Document object
doc = Document()
# Load a Word file
doc.LoadFromFile("C:/Users/Administrator/Desktop/Invoice.docx")

# Create a Workbook object
wb = Workbook()
# Remove the default worksheets
wb.Worksheets.Clear()

# Create a worksheet named "WordToExcel"
worksheet = wb.CreateEmptySheet("WordToExcel")
row = 1
column = 1

# Loop through the sections in the Word document
for sec_index in range(doc.Sections.Count):
    section = doc.Sections[sec_index] 
    # Loop through the document object under a certain section
    for obj_index in range(section.Body.ChildObjects.Count):
        documentObject = section.Body.ChildObjects[obj_index]
        # Determine if the object is a paragraph
        if isinstance(documentObject, Paragraph):
            cell = worksheet.Range[row, column]
            paragraph = documentObject
            # Copy paragraph from Word to a specific cell
            CopyTextAndStyle(worksheet, cell, paragraph)
            row += 1

        # Determine if the object is a table
        if isinstance(documentObject, Table):
            table = documentObject
            # Export table data from Word to Excel
            currentRow = ExportTableInExcel(worksheet, row, table)
            row = currentRow

# Auto fit row height and column width
worksheet.AllocatedRange.AutoFitRows()
worksheet.AllocatedRange.AutoFitColumns()

# Wrap text in cells
worksheet.AllocatedRange.IsWrapText = True

# Save the workbook to an Excel file
wb.SaveToFile("WordToExcel.xlsx", ExcelVersion.Version2013)
wb.Dispose()
doc.Dispose()

Python: Convert Word to Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

When working with Excel in C#, copying rows, columns, and cells can feel like a simple task, but it often comes with a catch—how to keep the formatting intact. Whether you’re organizing data for a report, creating a presentation, or just trying to keep your spreadsheet looking sharp, maintaining the original look is crucial. In this article, we will demonstrate the methods to copy rows, columns, and cells in Excel while preserving the original formatting in C# using Spire.XLS for .NET.

Install Spire.XLS for .NET

To begin with, you need to add the DLL files included in the Spire.XLS for.NET package as references in your .NET project. The DLLs files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.XLS

Copy Rows in Excel with Formatting in C#

Copying rows in Excel while preserving their formatting can be efficiently achieved using the Worksheet.CopyRow(CellRange sourceRow, Worksheet destSheet, int destRowIndex, CopyRangeOptions copyOptions) method. This method enables you to duplicate rows either within the same worksheet or across different worksheets with precision. Additionally, you can control the copying behavior, such as copying all formatting, conditional formatting, data validations, styles, or even just the formula values, through the CopyRangeOptions parameter.

The following steps explain how to copy rows across different worksheets with formatting using Spire.XLS for .NET.

  • Create an object of the Workbook class.
  • Load an Excel file using the Workbook.LoadFromFile() method.
  • Get the source worksheet and the destination worksheet using the Workbook.Worksheets[index] property.
  • Get the desired row that you want to copy using the Worksheet.Rows[index] property.
  • Copy the row and its formatting from the source worksheet to the destination worksheet using the Worksheet.CopyRow(CellRange sourceRow, Worksheet destSheet, int destRowIndex, CopyRangeOptions copyOptions) method.
  • Copy the column widths of cells in the source row to the corresponding cells in the destination row.
  • Save the workbook to a file using the Workbook.SaveToFile() method.
  • C#
using Spire.Xls;

namespace CopyRows
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a Workbook object
            Workbook workbook = new Workbook();
            // Load an Excel file
            workbook.LoadFromFile("ContactList.xlsx");

            // Get the source worksheet
            Worksheet sheet1 = workbook.Worksheets[0];

            // Get the destination worksheet
            Worksheet sheet2 = workbook.Worksheets[1];

            // Get the desired row that you want to copy
            CellRange row = sheet1.Rows[0];

            // Copy the row from the source worksheet to the first row of the destination worksheet
            sheet1.CopyRow(row, sheet2, 1, CopyRangeOptions.All);

            int columns = sheet1.Columns.Length;

            // Copy the column widths of the cells in the source row to the corresponding cells in the destination row
            for (int i = 0; i < columns; i++)
            {
                double columnWidth = row.Columns[i].ColumnWidth;
                sheet2.Rows[0].Columns[i].ColumnWidth = columnWidth;
            }

            // Save the workbook to a file
            workbook.SaveToFile("CopyRow.xlsx", ExcelVersion.Version2016);
            workbook.Dispose();
        }
    }
}

Copy Rows in Excel with Formatting in C#

Copy Columns in Excel with Formatting in C#

Similarly, copying columns in Excel with formatting can be accomplished using the Worksheet.CopyColumn(CellRange sourceColumn, Worksheet destSheet, int destColIndex, CopyRangeOptions copyOptions) method. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using the Workbook.LoadFromFile() method.
  • Get the source worksheet and the destination worksheet using the Workbook.Worksheets[index] property.
  • Get the desired column that you want to copy using the Worksheet.Columns[index] property.
  • Copy the column and its formatting from the source worksheet to the destination worksheet using the Worksheet.CopyColumn(CellRange sourceColumn, Worksheet destSheet, int destColIndex, CopyRangeOptions copyOptions) method.
  • Copy the row heights of cells in the source column to the corresponding cells in the destination column.
  • Save the workbook to a file using the Workbook.SaveToFile() method.
  • C#
using Spire.Xls;

namespace CopyColumns
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a Workbook object
            Workbook workbook = new Workbook();
            // Load an Excel file
            workbook.LoadFromFile("ContactList.xlsx");

            // Get the source worksheet
            Worksheet sheet1 = workbook.Worksheets[0];

            // Get the destination worksheet
            Worksheet sheet2 = workbook.Worksheets[1];

            // Get the desired column that you want to copy
            CellRange column = sheet1.Columns[0];

            // Copy the column from the source worksheet to the first column of the destination worksheet
            sheet1.CopyColumn(column, sheet2, 1, CopyRangeOptions.All);

            int rows = column.Rows.Length;

            // Copy the row heights of cells in the source column to the corresponding cells in the destination column
            for (int i = 0; i < rows; i++)
            {
                double rowHeight = column.Rows[i].RowHeight;
                sheet2.Columns[0].Rows[i].RowHeight = rowHeight;
            }

            // Save the workbook to a file
            workbook.SaveToFile("CopyColumn.xlsx", ExcelVersion.Version2016);
            workbook.Dispose();
        }
    }
}

Copy Columns in Excel with Formatting in C#

Copy Cells in Excel with Formatting in C#

In addition to copying rows and columns in Excel with formatting, Spire.XLS for .NET also allows copying cell ranges with formatting using the CellRange.Copy(CellRange destRange, CopyRangeOptions copyOptions) method. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using the Workbook.LoadFromFile() method.
  • Get the source worksheet and the destination worksheet using the Workbook.Worksheets[index] property.
  • Get the source cell range and the destination cell range using the Worksheet.Range[] property.
  • Copy the source cell range and its formatting from the source worksheet to the destination cell range in the destination worksheet using the CellRange.Copy(CellRange destRange, CopyRangeOptions copyOptions) method.
  • Copy the row heights and column widths of the source cell range to the destination cell range.
  • Save the workbook to a file using the Workbook.SaveToFile() method.
  • C#
using Spire.Xls;

namespace CopyCells
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a Workbook object
            Workbook workbook = new Workbook();
            // Load an Excel file
            workbook.LoadFromFile("ContactList.xlsx");

            // Get the source worksheet
            Worksheet sheet1 = workbook.Worksheets[0];
            // Get the destination worksheet
            Worksheet sheet2 = workbook.Worksheets[1];

            // Get the source cell range
            CellRange range1 = sheet1.Range["A1:E7"];
            // Get the destination cell range
            CellRange range2 = sheet2.Range["A1:E7"];

            // Copy the source cell range from the source worksheet to the destination cell range in the destination worksheet
            range1.Copy(range2, CopyRangeOptions.All);

            // Copy the row heights and column widths of the source cell range to the destination cell range
            for (int i = 0; i < range1.Rows.Length; i++)
            {
                CellRange row = range1.Rows[i];
                for (int j = 0; j < row.Columns.Length; j++)
                {
                    CellRange column = row.Columns[j];
                    range2.Rows[i].Columns[j].ColumnWidth = column.ColumnWidth;
                    range2.Rows[i].RowHeight = row.RowHeight;
                }
            }

            // Save the workbook to a file
            workbook.SaveToFile("CopyCells.xlsx", ExcelVersion.Version2016);
            workbook.Dispose();
        }
    }
}

Copy Cells in Excel with Formatting in C#

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

XML is often used for data interchange between different systems, while Excel is a widely recognized format for data analysis and reporting. By converting XML data to Excel, you can leverage Excel's powerful features to analyze and visualize the data more effectively. This conversion process is essential in various industries, including finance, healthcare, and e-commerce.

In this article, you will learn how to convert XML to Excel and PDF in C# using Spire.XLS for .NET.

Install Spire.XLS for .NET

To begin with, you need to add the DLL files included in the Spire.XLS for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.XLS

Understanding XML Structure: Elements, Attributes, and Data

Before converting XML to Excel, it's crucial to understand the structure of XML files. XML is a markup language that uses tags to define elements, attributes, and data. Here’s a breakdown of these components:

  • Elements: These are the building blocks of XML. They are defined by start and end tags and can contain data or other elements.
<person>
    <name>John Doe</name>
    <age>30</age>
</person>
  • Attributes: These provide additional information about elements. They are specified within the start tag of an element.
<person id="1">
    <name>John Doe</name>
    <age>30</age>
</person>
  • Data: This is the content enclosed within the start and end tags of an element.

Understanding these components will help you map XML data to Excel effectively.

Convert XML to Excel in C#

In .NET, you can use the System.Xml.Linq namespace, which provides classes for working with XML files. The primary class used is XDocument, which allows you to load, navigate, and manipulate XML documents effortlessly.

Here's an example:

  • C#
using System;
using System.Xml.Linq;

class Program
{
    static void Main()
    {
        // Load the XML file
        XDocument doc = XDocument.Load("data.xml");
        XElement root = doc.Root;

        // Iterate through elements
        foreach (XElement person in root.Elements("person"))
        {
            string name = person.Element("name")?.Value;
            string age = person.Element("age")?.Value;

            // Output the name and age
            Console.WriteLine($"Name: {name}, Age: {age}");
        }
    }
}

After parsing the XML data, the next step is to map it to an Excel worksheet. You can use Spire.XLS for .NET to create a new workbook, input data into specific cells, and apply various styles and formatting options. These include auto-fitting column widths, adjusting text alignment, and making the header bold.

To convert XML to Excel in C#, follow these steps:

  • Utilize the System.Xml.Linq library to extract data from the XML file.
  • Create a Workbook object.
  • Add a worksheet using the Workbook.Worksheets.Add() method.
  • Write the extracted data into the worksheet cells using the Worksheet.SetValue() method.
  • Apply styles and formatting to enhance the appearance of the worksheet.
  • Save the workbook to an Excel file using the Workbook.SaveToFile() method.

The following code demonstrates an efficient and advanced method for reading data from XML and importing it into an Excel file.

  • C#
using Spire.Xls;
using System.Xml.Linq;

namespace ConvertXmlToExcel
{
    class Program
    {
        static void Main(string[] args)
        {

            // Create a Workbook object
            Workbook workbook = new Workbook();

            // Remove default worksheets
            workbook.Worksheets.Clear();

            // Add a worksheet and name it
            Worksheet worksheet = workbook.Worksheets.Add("Books");

            // Load an XML file
            XDocument xmlDoc = XDocument.Load(@"C:\Users\Administrator\Desktop\Books.xml");
            XElement xmlRoot = xmlDoc.Root;

            // Get the first "book" element
            XElement firstBook = xmlRoot.Element("book");

            // Extract header information and convert it into a list
            var headers = firstBook.Elements().ToList();

            // Write header to Excel
            for (int colIndex = 0; colIndex < headers.Count; colIndex++)
            {
                string headerText = headers[colIndex].Name.LocalName;
                worksheet.SetValue(1, colIndex + 1, headerText);
            }

            // Write other data to Excel by iterating over each book element and each data node within it
            int rowIndex = 2;
            foreach (XElement book in xmlRoot.Elements("book"))
            {
                var dataNodes = book.Elements().ToList();
                for (int colIndex = 0; colIndex < dataNodes.Count; colIndex++)
                {
                    string value = dataNodes[colIndex].Value;
                    worksheet.SetValue(rowIndex, colIndex + 1, value);
                }
                rowIndex++;
            }

            // Set column width
            worksheet.AllocatedRange.AutoFitColumns();

            // Set alignment
            worksheet.AllocatedRange.HorizontalAlignment = HorizontalAlignType.Left;

            // Set font style
            worksheet.Range["A1:F1"].Style.Font.IsBold = true;

            // Save the workbook to an Excel file
            workbook.SaveToFile("output/XmlToExcel.xlsx");

            // Dispose resources
            workbook.Dispose();
        }
    }
}

The result Excel file containing the data extracted from an XML file

Convert XML to PDF in C#

The previous example effectively imports data from an XML file into an Excel worksheet. This worksheet can subsequently be converted to a PDF file using the Worksheet.SaveToPdf() method. To ensure a well-structured PDF, you may want to adjust page layout settings, such as margins and the preservation of gridlines, during the conversion process.

Here are the steps to convert XML to PDF using C#:

  • Use the System.Xml.Linq library to retrieve data from the XML file.
  • Create a Workbook object.
  • Add a worksheet with the Workbook.Worksheets.Add() method.
  • Populate the worksheet cells with data extracted from the XML file using the Worksheet.SetValue() method.
  • Apply styles and formatting to improve the worksheet's appearance.
  • Configure page settings using properties from the PageSetup object, accessible via Worksheet.PageSetup.
  • Save the worksheet as a PDF file using the Worksheet.SaveToPdf() method.

The following code snippet illustrates how to import data from XML into a worksheet and then save that worksheet as a PDF file.

  • C#
using Spire.Xls;
using Spire.Xls.Core;
using System.Xml.Linq;

namespace ConvertXmlToPdf
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Workbook object
            Workbook workbook = new Workbook();

            // Remove default worksheets
            workbook.Worksheets.Clear();

            // Add a worksheet and name it
            Worksheet worksheet = workbook.Worksheets.Add("Books");

            // Load an XML file
            XDocument xmlDoc = XDocument.Load(@"C:\Users\Administrator\Desktop\Books.xml");
            XElement xmlRoot = xmlDoc.Root;

            // Get the first "book" element
            XElement firstBook = xmlRoot.Element("book");

            // Extract header information and convert it into a list
            var headers = firstBook.Elements().ToList();

            // Write header to Excel
            for (int colIndex = 0; colIndex < headers.Count; colIndex++)
            {
                string headerText = headers[colIndex].Name.LocalName;
                worksheet.SetValue(1, colIndex + 1, headerText);
            }

            // Write other data to Excel by iterating over each book element and each data node within it
            int rowIndex = 2;
            foreach (XElement book in xmlRoot.Elements("book"))
            {
                var dataNodes = book.Elements().ToList();
                for (int colIndex = 0; colIndex < dataNodes.Count; colIndex++)
                {
                    string value = dataNodes[colIndex].Value;
                    worksheet.SetValue(rowIndex, colIndex + 1, value);
                }
                rowIndex++;
            }

            // Set column width
            worksheet.AllocatedRange.AutoFitColumns();

            // Set alignment
            worksheet.AllocatedRange.HorizontalAlignment = HorizontalAlignType.Left;

            // Set font style
            worksheet.Range["A1:F1"].Style.Font.IsBold = true;

            // Fit worksheet on one page
            workbook.ConverterSetting.SheetFitToPage = true;

            // Get the PageSetup object
            PageSetup pageSetup = worksheet.PageSetup;

            // Set page margins
            pageSetup.TopMargin = 0.3;
            pageSetup.BottomMargin = 0.3;
            pageSetup.LeftMargin = 0.3;
            pageSetup.RightMargin = 0.3;

            // Preserve gridlines 
            pageSetup.IsPrintGridlines = true;

            // Save the worksheet to a PDF file
            worksheet.SaveToPdf("output/XmlToPdf.pdf");

            // Dispose resources
            workbook.Dispose();
        }
    }
}

The result PDF file containing the data extracted from an XML file

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Page 1 of 307
page 1