Monday, 25 December 2023 01:32

Python: Create a Bar Chart in Excel

A bar chart is a type of graph that represents categorical data using rectangular bars. It is somewhat like a column chart, but with bars that extend horizontally from the Y-axis. The length of each bar corresponds to the value represented by a particular category or group, and changes, trends, or rankings can be quickly identified by comparing the lengths of the bars. In this article, you will learn how to create a clustered or stacked bar chart in Excel in Python using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python in VS Code

Create a Clustered Bar Chart in Excel in Python

The Worksheet.Chart.Add(ExcelChartType chartType) method provided by Spire.XLS for Python allows to add a chart to a worksheet. To add a clustered bar chart in Excel, you can set the chart type to BarClustered. The following are the steps.

  • Create a Workbook object.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Add chart data to specified cells and set the cell styles.
  • Add a clustered bar char to the worksheet using Worksheet.Chart.Add(ExcelChartType.BarClustered) method.
  • Set data range for the chart using Chart.DataRange property.
  • Set position, title, category axis and value axis for the chart.
  • Save the result file using Workbook.SaveToFile() method.
  • Python
from spire.common import *
from spire.xls import *

# Create a Workbook instance
workbook = Workbook()

# Get the first sheet and set its name
sheet = workbook.Worksheets[0]
sheet.Name = "ClusteredBar"

# Add chart data to specified cells
sheet.Range["A1"].Value = "Quarter"
sheet.Range["A2"].Value = "Q1"
sheet.Range["A3"].Value = "Q2"
sheet.Range["A4"].Value = "Q3"
sheet.Range["A5"].Value = "Q4"
sheet.Range["B1"].Value = "Team A"
sheet.Range["B2"].NumberValue = 3000
sheet.Range["B3"].NumberValue = 8000
sheet.Range["B4"].NumberValue = 9000
sheet.Range["B5"].NumberValue = 8500
sheet.Range["C1"].Value = "Team B"
sheet.Range["C2"].NumberValue = 7000
sheet.Range["C3"].NumberValue = 2000
sheet.Range["C4"].NumberValue = 5000
sheet.Range["C5"].NumberValue = 4200

# Set cell style
sheet.Range["A1:C1"].RowHeight = 18
sheet.Range["A1:C1"].Style.Color = Color.get_Black()
sheet.Range["A1:C1"].Style.Font.Color = Color.get_White()
sheet.Range["A1:C1"].Style.Font.IsBold = True
sheet.Range["A1:C1"].Style.VerticalAlignment = VerticalAlignType.Center
sheet.Range["A1:C1"].Style.HorizontalAlignment = HorizontalAlignType.Center
sheet.Range["A2:A5"].Style.HorizontalAlignment = HorizontalAlignType.Center
sheet.Range["B2:C5"].Style.NumberFormat = "\"$\"#,##0"

# Add a clustered bar chart to the sheet
chart = sheet.Charts.Add(ExcelChartType.BarClustered)

# Set data range of the chart
chart.DataRange = sheet.Range["A1:C5"]
chart.SeriesDataFromRange = False

# Set position of the chart
chart.LeftColumn = 1
chart.TopRow = 6
chart.RightColumn = 11
chart.BottomRow = 29

# Set and format chart title
chart.ChartTitle = "Team Sales Comparison per Quarter"
chart.ChartTitleArea.IsBold = True
chart.ChartTitleArea.Size = 12

# Set and format category axis
chart.PrimaryCategoryAxis.Title = "Country"
chart.PrimaryCategoryAxis.Font.IsBold = True
chart.PrimaryCategoryAxis.TitleArea.IsBold = True
chart.PrimaryCategoryAxis.TitleArea.TextRotationAngle = 90

# Set and format value axis
chart.PrimaryValueAxis.Title = "Sales (in Dollars)"
chart.PrimaryValueAxis.HasMajorGridLines = False
chart.PrimaryValueAxis.MinValue = 1000
chart.PrimaryValueAxis.TitleArea.IsBold = True

# Show data labels for data points
for cs in chart.Series:
    cs.Format.Options.IsVaryColor = True
    cs.DataPoints.DefaultDataPoint.DataLabels.HasValue = True

# Set legend position
chart.Legend.Position = LegendPositionType.Top

#Save the result file
workbook.SaveToFile("ClusteredBarChart.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Create a Bar Chart in Excel

Create a Stacked Bar Chart in Excel in Python

To create a stacked bar chart, you just need to change the Excel chart type to BarStacked. The following are the steps.

  • Create a Workbook object.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Add chart data to specified cells and set the cell styles.
  • Add a clustered bar char to the worksheet using Worksheet.Chart.Add(ExcelChartType.BarStacked) method.
  • Set data range for the chart using Chart.DataRange property.
  • Set position, title, category axis and value axis for the chart.
  • Save the result file using Workbook.SaveToFile() method.
  • Python
from spire.common import *
from spire.xls import *

# Create a Workbook instance
workbook = Workbook()

# Get the first sheet and set its name
sheet = workbook.Worksheets[0]
sheet.Name = "StackedBar"

# Add chart data to specified cells
sheet.Range["A1"].Value = "Quarter"
sheet.Range["A2"].Value = "Q1"
sheet.Range["A3"].Value = "Q2"
sheet.Range["A4"].Value = "Q3"
sheet.Range["A5"].Value = "Q4"
sheet.Range["B1"].Value = "Team A"
sheet.Range["B2"].NumberValue = 3000
sheet.Range["B3"].NumberValue = 8000
sheet.Range["B4"].NumberValue = 9000
sheet.Range["B5"].NumberValue = 8500
sheet.Range["C1"].Value = "Team B"
sheet.Range["C2"].NumberValue = 7000
sheet.Range["C3"].NumberValue = 2000
sheet.Range["C4"].NumberValue = 5000
sheet.Range["C5"].NumberValue = 4200

# Set cell style
sheet.Range["A1:C1"].RowHeight = 18
sheet.Range["A1:C1"].Style.Color = Color.get_Black()
sheet.Range["A1:C1"].Style.Font.Color = Color.get_White()
sheet.Range["A1:C1"].Style.Font.IsBold = True
sheet.Range["A1:C1"].Style.VerticalAlignment = VerticalAlignType.Center
sheet.Range["A1:C1"].Style.HorizontalAlignment = HorizontalAlignType.Center
sheet.Range["A2:A5"].Style.HorizontalAlignment = HorizontalAlignType.Center
sheet.Range["B2:C5"].Style.NumberFormat = "\"$\"#,##0"

# Add a clustered bar chart to the sheet
chart = sheet.Charts.Add(ExcelChartType.BarStacked)

# Set data range of the chart
chart.DataRange = sheet.Range["A1:C5"]
chart.SeriesDataFromRange = False

# Set position of the chart
chart.LeftColumn = 1
chart.TopRow = 6
chart.RightColumn = 11
chart.BottomRow = 29

# Set and format chart title
chart.ChartTitle = "Team Sales Comparison per Quarter"
chart.ChartTitleArea.IsBold = True
chart.ChartTitleArea.Size = 12

# Set and format category axis
chart.PrimaryCategoryAxis.Title = "Country"
chart.PrimaryCategoryAxis.Font.IsBold = True
chart.PrimaryCategoryAxis.TitleArea.IsBold = True
chart.PrimaryCategoryAxis.TitleArea.TextRotationAngle = 90

# Set and format value axis
chart.PrimaryValueAxis.Title = "Sales (in Dollars)"
chart.PrimaryValueAxis.HasMajorGridLines = False
chart.PrimaryValueAxis.MinValue = 1000
chart.PrimaryValueAxis.TitleArea.IsBold = True

# Show data labels for data points
for cs in chart.Series:
    cs.Format.Options.IsVaryColor = True
    cs.DataPoints.DefaultDataPoint.DataLabels.HasValue = True

# Set legend position
chart.Legend.Position = LegendPositionType.Top

#Save the result file
workbook.SaveToFile("StackedBarChart.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Create a Bar Chart in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Monday, 25 December 2023 01:23

Python: Convert RTF to PDF, HTML

RTF (Rich Text Format) is a versatile file format that can be opened and viewed by various word processing software. It supports a wide range of text formatting options, such as font style, size, color, tables, images, and more. When working with RTF files, you may sometimes need to convert them to PDF files for better sharing and printing, or to HTML format for publishing on the web. In this article, you will learn how to convert RTF to PDF or HTML with Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Convert RTF to PDF in Python

To convert an RTF file to PDF, simply load a file with .rtf extension and then save it as a PDF file using Document.SaveToFile(fileName, FileFormat.PDF) method. The following are the detailed steps.

  • Create a Document object.
  • Load an RTF file using Document.LoadFromFile() method.
  • Save the RTF file as a PDF file using Document.SaveToFile(fileName, FileFormat.PDF) method.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "input.rtf"
outputFile = "RtfToPDF.pdf"

# Create a Document object
doc = Document()

# Load an RTF file from disk
doc.LoadFromFile(inputFile)

# Save the RTF file as a PDF file
doc.SaveToFile(outputFile, FileFormat.PDF)
doc.Close()

Python: Convert RTF to PDF, HTML

Convert RTF to HTML in Python

Spire.Doc for Python also allows you to use the Document.SaveToFile(fileName, FileFormat.Html) method to convert the loaded RTF file to HTML format. The following are the detailed steps.

  • Create a Document object.
  • Load an RTF file using Document.LoadFromFile() method.
  • Save the RTF file in HTML format using Document.SaveToFile(fileName, FileFormat.Html) method.
  • Python
from spire.doc import *
from spire.doc.common import *

inputFile = "input.rtf"
outputFile = "RtfToHtml.html"
               
# Create a Document object
doc = Document()

# Load an RTF file from disk
doc.LoadFromFile(inputFile)

# Save the RTF file in HTML format
doc.SaveToFile(outputFile, FileFormat.Html)
doc.Close()

Python: Convert RTF to PDF, HTML

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Friday, 22 December 2023 09:39

Read Excel Files with Python

Excel files (spreadsheets) are used by people worldwide for organizing, analyzing, and storing tabular data. Due to their popularity, developers frequently encounter situations where they need to extract data from Excel or create reports in Excel format. Being able to read Excel files with Python opens up a wide range of possibilities for data processing and automation. In this article, you will learn how to read data (text or number values) from a cell, a cell range, or an entire worksheet by using the Spire.XLS for Python library.

Python Library for Reading Excel

Spire.XLS for Python is a reliable enterprise-level Python library for creating, writing, reading and editing Excel documents (XLS, XLSX, XLSB, XLSM, ODS) in a Python application. It provides a comprehensive set of interfaces, classes and properties that allow programmers to read and write Excel files with ease. Specifically, a cell in a worksheet can be accessed using the Worksheet.Range property and the value of the cell can be obtained using the CellRange.Value property.

The library is easy to install by running the following pip command. If you’d like to manually import the necessary dependencies, refer to How to Install Spire.XLS for Python in VS Code

pip install Spire.XLS

Classes and Properties in Spire.XLS for Python API

  • Workbook class: Represents an Excel workbook model, which you can use to create a workbook from scratch or load an existing Excel document and do modification on it.
  • Worksheet class: Represents a worksheet in a workbook.
  • CellRange class: Represents a specific cell or a cell range in a workbook.
  • Worksheet.Range property: Gets a cell or a range and returns an object of CellRange class.
  • Worksheet.AllocatedRange property: Gets the cell range containing data and returns an object of CellRange class.
  • CellRange.Value property: Gets the number value or text value of a cell. But if a cell has a formula, this property returns the formula instead of the result of the formula.

Read Data of a Particular Cell in Python

With Spire.XLS for Python, you can easily get the value of a certain cell by using the CellRange.Value property. The steps to read data of a particular Excel cell in Python are as follows.

  • Instantiate Workbook class
  • Load an Excel document using LoadFromFile method.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Get a specific cell using Worksheet.Range property.
  • Get the value of the cell using CellRange.Value property
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx")

# Get a specific worksheet
sheet = wb.Worksheets[0]

# Get a specific cell
certainCell = sheet.Range["D9"]

# Get the value of the cell
print("D9 has the value: " + certainCell.Value)

Read Excel Files with Python

Read Data from a Cell Range in Python

We already know how to get the value of a cell, to get the values of a range of cells, such as certain rows or columns, we just need to use loop statements to iterate through the cells, and then extract them one by one. The steps to read data from an Excel cell range in Python are as follows.

  • Instantiate Workbook class
  • Load an Excel document using LoadFromFile method.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Get a specific cell range using Worksheet.Range property.
  • Use for loop statements to retrieve each cell in the range, and get the value of a specific cell using CellRange.Value property
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an existing Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx")

# Get a specific worksheet
sheet = wb.Worksheets[0]

# Get a cell range
cellRange = sheet.Range["A2:H5"]

# Iterate through the rows
for i in range(len(cellRange.Rows)):

    # Iterate through the columns
    for j in range(len(cellRange.Rows[i].Columns)):

        # Get data of a specific cell
        print(cellRange[i + 2, j + 1].Value + "  ", end='')

    print("")

Read Excel Files with Python

Read Data from an Excel Worksheet in Python

Spire.XLS for Python offers the Worksheet.AllocatedRange property to automatically obtain the cell range that contains data from a worksheet. Then, we traverse the cells within the cell range rather than the entire worksheet, and retrieve cell values one by one. The following are the steps to read data from an Excel worksheet in Python.

  • Instantiate Workbook class
  • Load an Excel document using LoadFromFile method.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Get the cell range containing data from the worksheet using Worksheet.AllocatedRange property.
  • Use for loop statements to retrieve each cell in the range, and get the value of a specific cell using CellRange.Value property
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an existing Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx")

# Get the first worksheet
sheet = wb.Worksheets[0]

# Get the cell range containing data
locatedRange = sheet.AllocatedRange

# Iterate through the rows
for i in range(len(sheet.Rows)):

    # Iterate through the columns
    for j in range(len(locatedRange.Rows[i].Columns)):

        # Get data of a specific cell
        print(locatedRange[i + 1, j + 1].Value + "  ", end='')

    print("")

Read Excel Files with Python

Read Value Rather than Formula in a Cell in Python

As mentioned earlier, when a cell contains a formula, the CellRange.Value property returns the formula itself, not the value of the formula. If you want to get the value, you need to use the str(CellRange.FormulaValue) method. The following are the steps to read value rather than formula in an Excel cell in Python.

  • Instantiate Workbook class
  • Load an Excel document using LoadFromFile method.
  • Get a specific worksheet using Workbook.Worksheets[index] property.
  • Get a specific cell using Worksheet.Range property.
  • Determine if the cell has a formula using CellRange.HasFormula property.
  • Get the formula value of the cell using str(CellRange.FormulaValue) method
  • Python
from spire.xls import *
from spire.xls.common import *

# Create a Workbook object
wb = Workbook()

# Load an Excel file
wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Formula.xlsx")

# Get a specific worksheet
sheet = wb.Worksheets[0]

# Get a specific cell
certainCell = sheet.Range["D4"]

# Determine if the cell has formula
if(certainCell.HasFormula):

    # Get the formula value of the cell
    print(str(certainCell.FormulaValue))

Read Excel Files with Python

Conclusion

In this blog post, we learned how to read data from cells, cell regions, and worksheets in Python with the help of Spire.XLS for Python API. We also discussed how to determine if a cell has a formula and how to get the value of the formula. This library supports extraction of many other elements in Excel such as images, hyperlinks and OEL objects. Check out our online documentation for more tutorials. If you have any questions, please contact us by email or on the forum.

See Also

Friday, 22 December 2023 01:04

Python: Convert PDF to PDF/A and Vice Versa

PDF/A is a specialized format designed specifically for long-term archiving and preservation of electronic documents. It guarantees that the content, structure, and visual appearance of the documents remain unchanged over time. By converting PDF files to PDF/A format, you ensure the long-term accessibility of the documents, regardless of software, operating systems, or future technological advancements. Conversely, converting PDF/A files to standard PDF format makes it easier to edit, share, and collaborate on the documents, ensuring better compatibility across different applications, devices, and platforms. In this article, we will explain how to convert PDF to PDF/A and vice versa in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python in VS Code

Convert PDF to PDF/A in Python

The PdfStandardsConverter class provided by Spire.PDF for Python supports converting PDF to various PDF/A formats, including PDF/A-1a, 2a, 3a, 1b, 2b and 3b. Moreover, it also supports converting PDF to PDF/X-1a:2001. The detailed steps are as follows.

  • Specify the input file path and output folder.
  • Create a PdfStandardsConverter object and pass the input file path to the constructor of the class as a parameter.
  • Convert the input file to a Pdf/A-1a conformance file using PdfStandardsConverter.ToPdfA1A() method.
  • Convert the input file to a Pdf/A-1b file using PdfStandardsConverter.ToPdfA1B() method.
  • Convert the input file to a Pdf/A-2a file using PdfStandardsConverter.ToPdfA2A() method.
  • Convert the input file to a Pdf/A-2b file using PdfStandardsConverter.ToPdfA2B() method.
  • Convert the input file to a Pdf/A-3a file using PdfStandardsConverter.ToPdfA3A() method.
  • Convert the input file to a Pdf/A-3b file using PdfStandardsConverter.ToPdfA3B() method.
  • Convert the input file to a PDF/X-1a:2001 file using PdfStandardsConverter.ToPdfX1A2001() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Specify the input file path and output folder
inputFile = "Sample.pdf"
outputFolder = "Output/"

# Create an object of the PdfStandardsConverter class
converter = PdfStandardsConverter(inputFile)

# Convert the input file to PdfA1A
converter.ToPdfA1A(outputFolder + "ToPdfA1A.pdf")

# Convert the input file to PdfA1B
converter.ToPdfA1B(outputFolder + "ToPdfA1B.pdf")

# Convert the input file to PdfA2A
converter.ToPdfA2A(outputFolder + "ToPdfA2A.pdf")

# Convert the input file to PdfA2B
converter.ToPdfA2B(outputFolder + "ToPdfA2B.pdf")

# Convert the input file to PdfA3A
converter.ToPdfA3A(outputFolder + "ToPdfA3A.pdf")

# Convert the input file to PdfA3B
converter.ToPdfA3B(outputFolder + "ToPdfA3B.pdf")

# Convert the input file to PDF/X-1a:2001
converter.ToPdfX1A2001(outputFolder + "ToPdfX1a.pdf")

Python: Convert PDF to PDF/A and Vice Versa

Convert PDF/A to PDF in Python

To convert a PDF/A file back to a standard PDF format, you need to create a new standard PDF file, and then draw the page content of the PDF/A file to the newly created PDF file. The detailed steps are as follows.

  • Create a PdfDocument object.
  • Load a PDF/A file using PdfDocument.LoadFromFile() method.
  • Create a PdfNewDocument object and set its compression level as none.
  • Loop through the pages in the original PDF/A file.
  • Add pages to the newly created PDF using PdfDocumentBase.Pages.Add() method.
  • Draw the page content of the original PDF/A file to the corresponding pages of the newly created PDF using PdfPageBase.CreateTemplate.Draw() method.
  • Create a Stream object and then save the new PDF to the stream using PdfNewDocument.Save() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
inputFile = "Output/ToPdfA1A.pdf"
outputFile = "PdfAToPdf.pdf"

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile(inputFile)

# Create a new standard PDF file
newDoc = PdfNewDocument()
newDoc.CompressionLevel = PdfCompressionLevel.none

# Add pages to the newly created PDF and draw the page content of the loaded PDF onto the corresponding pages of the newly created PDF
for i in range(doc.Pages.Count):
    page = doc.Pages.get_Item(i)
    size = page.Size
    p = newDoc.Pages.Add(size, PdfMargins(0.0))
    page.CreateTemplate().Draw(p, 0.0, 0.0)   

# Save the new PDF to a PDF file   
fileStream = Stream(outputFile)
newDoc.Save(fileStream)
fileStream.Close()
newDoc.Close(True)

Python: Convert PDF to PDF/A and Vice Versa

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

We are thrilled to announce the launch of Spire.OCR for Java, an OCR (Optical Character Recognition) product designed for efficient text extraction from images. Developed by E-iceblue, Spire.OCR for Java is now available to empower developers with seamless integration of OCR functionalities into their Java applications.

Spire.OCR for Java

Spire.OCR for Java is a professional OCR library to read text from Images in JPG, PNG, GIF, BMP and TIFF formats. Developers can easily add OCR functionalities on Java applications (J2SE and J2EE). It supports commonly used image formats and provides functionalities like reading multiple characters and fonts from images, bold and italic styles and much more.

Click the following link to get Spire.OCR for Java:
How to sue Spire.OCR for Java:

Contact Us

We are pleased to announce the release of Spire.PDFViewer 7.12.3. This version supports the interface zoom effect by using Ctrl + scroll wheel in WinForm projects. What’s more, it also fixes the issue that text content could not be displayed. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature SPIREPDFVIEWER-579 Supports the interface zoom effect by using Ctrl + scroll wheel in WinForm projects.
this.KeyPreview = true;
this.KeyDown += new System.Windows.Forms.KeyEventHandler(this.Form1_KeyDown);
this.KeyUp += new System.Windows.Forms.KeyEventHandler(Form1_KeyUp);
this.MouseWheel += new System.Windows.Forms.MouseEventHandler(Form1_MouseWheel);
private bool m_PressCtrl = false;
private float m_ZoomFactor = 1.0f;
private void Form1_KeyDown(object sender, KeyEventArgs e)
{
	m_PressCtrl = e.Control;
}       
private void Form1_KeyUp(object sender, KeyEventArgs e)
{
	m_PressCtrl = false;
}        
private float[] array = new float[] { 0.5f, 0.75f, 1f, 1.25f, 1.5f, 2f, 4f };
private int index = 2;     
private void Form1_MouseWheel(object sender, MouseEventArgs e)
{
	if (m_PressCtrl)
	{
		if (e.Delta > 0)
		{
			index = index < 6 ? index + 1 : 6;
		}
		if (e.Delta < 0)
		{
			index = index == 0 ? 0 : index - 1;
		}
		this.pdfViewer1.SetZoomFactor(array[index]);
	}
}
Bug SPIREPDFVIEWER -577 Fixes the issue that text content could not be displayed.
Click the link to get Spire.PDFViewer 7.12.3:
More information of Spire.PDFViewer new release or hotfix:

OCR (Optical Character Recognition) technology is the primary method to extract text from images. Spire.OCR for Java provides developers with a quick and efficient solution to scan and extract text from images in Java projects. This article will guide you on how to use Spire.OCR for Java to recognize and extract text from images in Java projects.

Obtaining Spire.OCR for Java

To scan and recognize text in images using Spire.OCR for Java, you need to first import the Spire.OCR.jar file along with other relevant dependencies into your Java project.

You can download Spire.OCR for Java from our website. If you are using Maven, you can add the following code to your project's pom.xml file to import the JAR file into your application.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.cn/repository/maven-public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.ocr</artifactId>
        <version>1.9.0</version>
    </dependency>
</dependencies>

Please download the other dependencies based on your operating system:

Install Dependencies

Step 1: Create a Java project in IntelliJ IDEA.

How to Scan and Recognize Text from Images in Java Projects

Step 2: Go to File > Project Structure > Modules > Dependencies in the menu and add Spire.OCR.jar as a project dependency.

How to Scan and Recognize Text from Images in Java Projects

Step 3: Download and extract the other dependency files. Copy all the files from the extracted "dependencies" folder to your project directory.

How to Scan and Recognize Text from Images in Java Projects

Scanning and Recognizing Text from a Local Image

  • Java
import com.spire.ocr.OcrScanner;
import java.io.*;

public class ScanLocalImage {
    public static void main(String[] args) throws Exception {
        // Specify the path to the dependency files
        String dependencies = "dependencies/";
        // Specify the path to the image file to be scanned
        String imageFile = "data/Sample.png";
        // Specify the path to the output file
        String outputFile = "ScanLocalImage_out.txt";
        
        // Create an OcrScanner object
        OcrScanner scanner = new OcrScanner();
        // Set the dependency file path for the OcrScanner object
        scanner.setDependencies(dependencies);
        
        // Use the OcrScanner object to scan the specified image file
        scanner.scan(imageFile);
        
        // Get the scanned text content
        String scannedText = scanner.getText().toString();
        
        // Create an output file object
        File output = new File(outputFile);
        // If the output file already exists, delete it
        if (output.exists()) {
            output.delete();
        }
        // Create a BufferedWriter object to write content to the output file
        BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile));
        // Write the scanned text content to the output file
        writer.write(scannedText);
        // Close the BufferedWriter object to release resources
        writer.close();
    }
}

Specify the Language File to Scan and Recognize Text from an Image

  • Java
import com.spire.ocr.OcrScanner;
import java.io.*;

public class ScanImageWithLanguageSelection {
    public static void main(String[] args) throws Exception {
        // Specify the path to the dependency files
        String dependencies = "dependencies/";
        // Specify the path to the language file
        String languageFile = "data/japandata";
        // Specify the path to the image file to be scanned
        String imageFile = "data/JapaneseSample.png";
        // Specify the path to the output file
        String outputFile = "ScanImageWithLanguageSelection_out.txt";
        
        // Create an OcrScanner object
        OcrScanner scanner = new OcrScanner();
        // Set the dependency file path for the OcrScanner object
        scanner.setDependencies(dependencies);
        // Load the specified language file
        scanner.loadLanguageFile(languageFile);
        
        // Use the OcrScanner object to scan the specified image file
        scanner.scan(imageFile);
        // Get the scanned text content
        String scannedText = scanner.getText().toString();

        // Create an output file object
        File output = new File(outputFile);
        // If the output file already exists, delete it
        if (output.exists()) {
            output.delete();
        }

        // Create a BufferedWriter object to write content to the output file
        BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile));
        // Write the scanned text content to the output file
        writer.write(scannedText);
        // Close the BufferedWriter object to release resources
        writer.close();
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 21 December 2023 01:35

Python: Convert PDF to HTML

HTML is the standard markup language for web pages. Converting PDF documents to HTML format allows you to embed them directly into web pages, making them accessible and viewable in web browsers without requiring additional software or plugins. In this article, we will demonstrate how to convert PDF to HTML in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python in VS Code

Convert PDF to HTML in Python

The PdfDocument.SaveToFile() method provided by Spire.PDF for Python allows you to convert a PDF document to HTML format. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Save the PDF document to HTML format using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF document
doc.LoadFromFile("Sample.pdf")

# Save the PDF document to HTML format
doc.SaveToFile("PdfToHtml.html", FileFormat.HTML)
doc.Close()

Python: Convert PDF to HTML

Set Conversion Options When Converting PDF to HTML in Python

The SetPdfToHtmlOptions() method of the PdfConvertOptions class enables you to specify the conversion options when converting PDF files to HTML. This method accepts the following parameters:

  • useEmbeddedSvg (bool): Indicates whether to embed SVG in the resulting HTML file.
  • useEmbeddedImg (bool): Indicates whether to embed images in the resulting HTML file. This option is applicable only when useEmbeddedSvg is set to False.
  • maxPageOneFile (bool): Specifies the maximum number of pages to be included per HTML file. This option is applicable only when useEmbeddedSvg is set to False.
  • useHighQualityEmbeddedSvg (bool): Indicates whether to use high-quality embedded SVG in the resulting HTML file. This option is applicable when useEmbeddedSvg is set to True.

The following steps explain how to specify conversion options when converting PDF to HTML.

  • Create an object of the PdfDocument class.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Get the PdfConvertOptions object using PdfDocument.ConvertOptions property.
  • Specify the PDF to HTML conversion options using PdfConvertOptions.SetPdfToHtmlOptions() method.
  • Save the PDF document to HTML format using PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF document
doc.LoadFromFile("Sample.pdf")

# Set the conversion options to embed images in the resulting HTML and limit one page per HTML file
pdfToHtmlOptions = doc.ConvertOptions
pdfToHtmlOptions.SetPdfToHtmlOptions(False, True, 1, False)

# Save the PDF document to HTML format
doc.SaveToFile("PdfToHtmlWithCustomOptions.html", FileFormat.HTML)
doc.Close()

Convert PDF to HTML Stream in Python

In addition to converting a PDF document to an HTML file, you can save it to an HTML stream by using the PdfDocument.SaveToStream() method. The detailed steps are as follows.

  • Create an object of the PdfDocument class.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Create an object of the Stream class.
  • Save the PDF document to an HTML stream using PdfDocument.SaveToStream() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF document
doc.LoadFromFile("Sample.pdf")

# Save the PDF document to HTML stream
fileStream = Stream("PdfToHtmlStream.html")
doc.SaveToStream(fileStream, FileFormat.HTML)
fileStream.Close()
doc.Close()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 21 December 2023 01:22

Python: Convert Word to RTF and Vice Versa

RTF is a flexible file format that preserves formatting and basic styling while offering compatibility with various word processing software. Converting Word to RTF enables users to retain document structure, fonts, hyperlinks, and other essential elements without the need for specialized software. Similarly, converting RTF back to Word format provides the flexibility to edit and enhance documents using the powerful features of Microsoft Word. In this article, you will learn how to convert Word to RTF and vice versa in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Convert Word to RTF in Python

With Spire.Doc for Python, you can load a Word file using the Document.LoadFromFile() method and convert it to a different format, such as RTF, using the Document.SaveToFile() method; Conversely, you can load an RTF file in the same way and save it as a Word file.

The following are the steps to convert Word to RTF using Spire.Doc for Python.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Convert it to an RTF file using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *
               
# Create a Document object
document = Document()

# Load a Word file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx")

# Convert to a RTF file
document.SaveToFile("output/ToRtf.rtf", FileFormat.Rtf)
document.Close()

Python: Convert Word to RTF and Vice Versa

Convert RTF to Word in Python

The code for converting RTF to Word is quite simply, too. Follow the steps below.

  • Create a Document object.
  • Load an RTF file using Document.LoadFromFile() method.
  • Convert it to a Word file using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *
               
# Create a Document object
document = Document()

# Load a Rtf file
document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.rtf")

# Convert to a Word file
document.SaveToFile("output/ToWord.docx", FileFormat.Docx2019)
document.Close()

Python: Convert Word to RTF and Vice Versa

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

We are delighted to announce the release of Spire.XLS for Java 13.12.12. This version supports retrieving embedded images added with WPS tools. Besides, it enhances the conversion from XLSM to PDF. Moreover, some known issues are fixed successfully in this version, such as the issue that the program suspended when loading an Excel document. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature SPIREXLS-4971 Adds the worksheet.getCellImages() method to retrieve embedded images added with WPS tools.
Workbook workbook = new Workbook();
workbook.loadFromFile("sample.xlsx");
Worksheet sheet = workbook.getWorksheets().get(0);
ExcelPicture[] picture = sheet.getCellImages();
for (int i = 0; i < picture.length; i++) {
 ExcelPicture ep = picture[i];
 BufferedImage image = ep.getPicture();
 ImageIO.write(image,"PNG", new File(outputFile + String.format("pic_%d.png",i)));
}
Bug SPIREXLS-4971 Fixes the issue that the program threw an exception "Index is less than 0 or more than or equal to the list count." when getting inline pictures added with WPS tools.
Bug SPIREXLS-4996 Fixes the issue that the program suspended when loading Excel documents.
Bug SPIREXLS-5010 Fixes the issue that the font size of the retrieved text was incorrect.
Bug SPIREXLS-5021 Fixes the issue that the coordinate axes data of charts in the saved Excel document were incorrect.
Bug SPIREXLS-5024 Fixes the issue that the program threw a java.lang.StringIndexOutOfBoundsException when converting an XLSM document to PDF.
Click the link to download Spire.XLS for Java 13.12.12: