Python: Create a Bar Chart in Excel
A bar chart is a type of graph that represents categorical data using rectangular bars. It is somewhat like a column chart, but with bars that extend horizontally from the Y-axis. The length of each bar corresponds to the value represented by a particular category or group, and changes, trends, or rankings can be quickly identified by comparing the lengths of the bars. In this article, you will learn how to create a clustered or stacked bar chart in Excel in Python using Spire.XLS for Python.
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python in VS Code
Create a Clustered Bar Chart in Excel in Python
The Worksheet.Chart.Add(ExcelChartType chartType) method provided by Spire.XLS for Python allows to add a chart to a worksheet. To add a clustered bar chart in Excel, you can set the chart type to BarClustered. The following are the steps.
- Create a Workbook object.
- Get a specific worksheet using Workbook.Worksheets[index] property.
- Add chart data to specified cells and set the cell styles.
- Add a clustered bar char to the worksheet using Worksheet.Chart.Add(ExcelChartType.BarClustered) method.
- Set data range for the chart using Chart.DataRange property.
- Set position, title, category axis and value axis for the chart.
- Save the result file using Workbook.SaveToFile() method.
- Python
from spire.common import * from spire.xls import * # Create a Workbook instance workbook = Workbook() # Get the first sheet and set its name sheet = workbook.Worksheets[0] sheet.Name = "ClusteredBar" # Add chart data to specified cells sheet.Range["A1"].Value = "Quarter" sheet.Range["A2"].Value = "Q1" sheet.Range["A3"].Value = "Q2" sheet.Range["A4"].Value = "Q3" sheet.Range["A5"].Value = "Q4" sheet.Range["B1"].Value = "Team A" sheet.Range["B2"].NumberValue = 3000 sheet.Range["B3"].NumberValue = 8000 sheet.Range["B4"].NumberValue = 9000 sheet.Range["B5"].NumberValue = 8500 sheet.Range["C1"].Value = "Team B" sheet.Range["C2"].NumberValue = 7000 sheet.Range["C3"].NumberValue = 2000 sheet.Range["C4"].NumberValue = 5000 sheet.Range["C5"].NumberValue = 4200 # Set cell style sheet.Range["A1:C1"].RowHeight = 18 sheet.Range["A1:C1"].Style.Color = Color.get_Black() sheet.Range["A1:C1"].Style.Font.Color = Color.get_White() sheet.Range["A1:C1"].Style.Font.IsBold = True sheet.Range["A1:C1"].Style.VerticalAlignment = VerticalAlignType.Center sheet.Range["A1:C1"].Style.HorizontalAlignment = HorizontalAlignType.Center sheet.Range["A2:A5"].Style.HorizontalAlignment = HorizontalAlignType.Center sheet.Range["B2:C5"].Style.NumberFormat = "\"$\"#,##0" # Add a clustered bar chart to the sheet chart = sheet.Charts.Add(ExcelChartType.BarClustered) # Set data range of the chart chart.DataRange = sheet.Range["A1:C5"] chart.SeriesDataFromRange = False # Set position of the chart chart.LeftColumn = 1 chart.TopRow = 6 chart.RightColumn = 11 chart.BottomRow = 29 # Set and format chart title chart.ChartTitle = "Team Sales Comparison per Quarter" chart.ChartTitleArea.IsBold = True chart.ChartTitleArea.Size = 12 # Set and format category axis chart.PrimaryCategoryAxis.Title = "Country" chart.PrimaryCategoryAxis.Font.IsBold = True chart.PrimaryCategoryAxis.TitleArea.IsBold = True chart.PrimaryCategoryAxis.TitleArea.TextRotationAngle = 90 # Set and format value axis chart.PrimaryValueAxis.Title = "Sales (in Dollars)" chart.PrimaryValueAxis.HasMajorGridLines = False chart.PrimaryValueAxis.MinValue = 1000 chart.PrimaryValueAxis.TitleArea.IsBold = True # Show data labels for data points for cs in chart.Series: cs.Format.Options.IsVaryColor = True cs.DataPoints.DefaultDataPoint.DataLabels.HasValue = True # Set legend position chart.Legend.Position = LegendPositionType.Top #Save the result file workbook.SaveToFile("ClusteredBarChart.xlsx", ExcelVersion.Version2016) workbook.Dispose()
Create a Stacked Bar Chart in Excel in Python
To create a stacked bar chart, you just need to change the Excel chart type to BarStacked. The following are the steps.
- Create a Workbook object.
- Get a specific worksheet using Workbook.Worksheets[index] property.
- Add chart data to specified cells and set the cell styles.
- Add a clustered bar char to the worksheet using Worksheet.Chart.Add(ExcelChartType.BarStacked) method.
- Set data range for the chart using Chart.DataRange property.
- Set position, title, category axis and value axis for the chart.
- Save the result file using Workbook.SaveToFile() method.
- Python
from spire.common import * from spire.xls import * # Create a Workbook instance workbook = Workbook() # Get the first sheet and set its name sheet = workbook.Worksheets[0] sheet.Name = "StackedBar" # Add chart data to specified cells sheet.Range["A1"].Value = "Quarter" sheet.Range["A2"].Value = "Q1" sheet.Range["A3"].Value = "Q2" sheet.Range["A4"].Value = "Q3" sheet.Range["A5"].Value = "Q4" sheet.Range["B1"].Value = "Team A" sheet.Range["B2"].NumberValue = 3000 sheet.Range["B3"].NumberValue = 8000 sheet.Range["B4"].NumberValue = 9000 sheet.Range["B5"].NumberValue = 8500 sheet.Range["C1"].Value = "Team B" sheet.Range["C2"].NumberValue = 7000 sheet.Range["C3"].NumberValue = 2000 sheet.Range["C4"].NumberValue = 5000 sheet.Range["C5"].NumberValue = 4200 # Set cell style sheet.Range["A1:C1"].RowHeight = 18 sheet.Range["A1:C1"].Style.Color = Color.get_Black() sheet.Range["A1:C1"].Style.Font.Color = Color.get_White() sheet.Range["A1:C1"].Style.Font.IsBold = True sheet.Range["A1:C1"].Style.VerticalAlignment = VerticalAlignType.Center sheet.Range["A1:C1"].Style.HorizontalAlignment = HorizontalAlignType.Center sheet.Range["A2:A5"].Style.HorizontalAlignment = HorizontalAlignType.Center sheet.Range["B2:C5"].Style.NumberFormat = "\"$\"#,##0" # Add a clustered bar chart to the sheet chart = sheet.Charts.Add(ExcelChartType.BarStacked) # Set data range of the chart chart.DataRange = sheet.Range["A1:C5"] chart.SeriesDataFromRange = False # Set position of the chart chart.LeftColumn = 1 chart.TopRow = 6 chart.RightColumn = 11 chart.BottomRow = 29 # Set and format chart title chart.ChartTitle = "Team Sales Comparison per Quarter" chart.ChartTitleArea.IsBold = True chart.ChartTitleArea.Size = 12 # Set and format category axis chart.PrimaryCategoryAxis.Title = "Country" chart.PrimaryCategoryAxis.Font.IsBold = True chart.PrimaryCategoryAxis.TitleArea.IsBold = True chart.PrimaryCategoryAxis.TitleArea.TextRotationAngle = 90 # Set and format value axis chart.PrimaryValueAxis.Title = "Sales (in Dollars)" chart.PrimaryValueAxis.HasMajorGridLines = False chart.PrimaryValueAxis.MinValue = 1000 chart.PrimaryValueAxis.TitleArea.IsBold = True # Show data labels for data points for cs in chart.Series: cs.Format.Options.IsVaryColor = True cs.DataPoints.DefaultDataPoint.DataLabels.HasValue = True # Set legend position chart.Legend.Position = LegendPositionType.Top #Save the result file workbook.SaveToFile("StackedBarChart.xlsx", ExcelVersion.Version2016) workbook.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert RTF to PDF, HTML
RTF (Rich Text Format) is a versatile file format that can be opened and viewed by various word processing software. It supports a wide range of text formatting options, such as font style, size, color, tables, images, and more. When working with RTF files, you may sometimes need to convert them to PDF files for better sharing and printing, or to HTML format for publishing on the web. In this article, you will learn how to convert RTF to PDF or HTML with Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip commands.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code
Convert RTF to PDF in Python
To convert an RTF file to PDF, simply load a file with .rtf extension and then save it as a PDF file using Document.SaveToFile(fileName, FileFormat.PDF) method. The following are the detailed steps.
- Create a Document object.
- Load an RTF file using Document.LoadFromFile() method.
- Save the RTF file as a PDF file using Document.SaveToFile(fileName, FileFormat.PDF) method.
- Python
from spire.doc import * from spire.doc.common import * inputFile = "input.rtf" outputFile = "RtfToPDF.pdf" # Create a Document object doc = Document() # Load an RTF file from disk doc.LoadFromFile(inputFile) # Save the RTF file as a PDF file doc.SaveToFile(outputFile, FileFormat.PDF) doc.Close()
Convert RTF to HTML in Python
Spire.Doc for Python also allows you to use the Document.SaveToFile(fileName, FileFormat.Html) method to convert the loaded RTF file to HTML format. The following are the detailed steps.
- Create a Document object.
- Load an RTF file using Document.LoadFromFile() method.
- Save the RTF file in HTML format using Document.SaveToFile(fileName, FileFormat.Html) method.
- Python
from spire.doc import * from spire.doc.common import * inputFile = "input.rtf" outputFile = "RtfToHtml.html" # Create a Document object doc = Document() # Load an RTF file from disk doc.LoadFromFile(inputFile) # Save the RTF file in HTML format doc.SaveToFile(outputFile, FileFormat.Html) doc.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Read Excel Files with Python
Table of Contents
Install with Pip
pip install Spire.XLS
Related Links
Excel files (spreadsheets) are used by people worldwide for organizing, analyzing, and storing tabular data. Due to their popularity, developers frequently encounter situations where they need to extract data from Excel or create reports in Excel format. Being able to read Excel files with Python opens up a wide range of possibilities for data processing and automation. In this article, you will learn how to read data (text or number values) from a cell, a cell range, or an entire worksheet by using the Spire.XLS for Python library.
- Read Data of a Particular Cell in Python
- Read Data from a Cell Range in Python
- Read Data from an Excel Worksheet in Python
- Read Value Rather than Formula in a Cell in Python
Python Library for Reading Excel
Spire.XLS for Python is a reliable enterprise-level Python library for creating, writing, reading and editing Excel documents (XLS, XLSX, XLSB, XLSM, ODS) in a Python application. It provides a comprehensive set of interfaces, classes and properties that allow programmers to read and write Excel files with ease. Specifically, a cell in a worksheet can be accessed using the Worksheet.Range property and the value of the cell can be obtained using the CellRange.Value property.
The library is easy to install by running the following pip command. If you’d like to manually import the necessary dependencies, refer to How to Install Spire.XLS for Python in VS Code
pip install Spire.XLS
Classes and Properties in Spire.XLS for Python API
- Workbook class: Represents an Excel workbook model, which you can use to create a workbook from scratch or load an existing Excel document and do modification on it.
- Worksheet class: Represents a worksheet in a workbook.
- CellRange class: Represents a specific cell or a cell range in a workbook.
- Worksheet.Range property: Gets a cell or a range and returns an object of CellRange class.
- Worksheet.AllocatedRange property: Gets the cell range containing data and returns an object of CellRange class.
- CellRange.Value property: Gets the number value or text value of a cell. But if a cell has a formula, this property returns the formula instead of the result of the formula.
Read Data of a Particular Cell in Python
With Spire.XLS for Python, you can easily get the value of a certain cell by using the CellRange.Value property. The steps to read data of a particular Excel cell in Python are as follows.
- Instantiate Workbook class
- Load an Excel document using LoadFromFile method.
- Get a specific worksheet using Workbook.Worksheets[index] property.
- Get a specific cell using Worksheet.Range property.
- Get the value of the cell using CellRange.Value property
- Python
from spire.xls import * from spire.xls.common import * # Create a Workbook object wb = Workbook() # Load an Excel file wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx") # Get a specific worksheet sheet = wb.Worksheets[0] # Get a specific cell certainCell = sheet.Range["D9"] # Get the value of the cell print("D9 has the value: " + certainCell.Value)
Read Data from a Cell Range in Python
We already know how to get the value of a cell, to get the values of a range of cells, such as certain rows or columns, we just need to use loop statements to iterate through the cells, and then extract them one by one. The steps to read data from an Excel cell range in Python are as follows.
- Instantiate Workbook class
- Load an Excel document using LoadFromFile method.
- Get a specific worksheet using Workbook.Worksheets[index] property.
- Get a specific cell range using Worksheet.Range property.
- Use for loop statements to retrieve each cell in the range, and get the value of a specific cell using CellRange.Value property
- Python
from spire.xls import * from spire.xls.common import * # Create a Workbook object wb = Workbook() # Load an existing Excel file wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx") # Get a specific worksheet sheet = wb.Worksheets[0] # Get a cell range cellRange = sheet.Range["A2:H5"] # Iterate through the rows for i in range(len(cellRange.Rows)): # Iterate through the columns for j in range(len(cellRange.Rows[i].Columns)): # Get data of a specific cell print(cellRange[i + 2, j + 1].Value + " ", end='') print("")
Read Data from an Excel Worksheet in Python
Spire.XLS for Python offers the Worksheet.AllocatedRange property to automatically obtain the cell range that contains data from a worksheet. Then, we traverse the cells within the cell range rather than the entire worksheet, and retrieve cell values one by one. The following are the steps to read data from an Excel worksheet in Python.
- Instantiate Workbook class
- Load an Excel document using LoadFromFile method.
- Get a specific worksheet using Workbook.Worksheets[index] property.
- Get the cell range containing data from the worksheet using Worksheet.AllocatedRange property.
- Use for loop statements to retrieve each cell in the range, and get the value of a specific cell using CellRange.Value property
- Python
from spire.xls import * from spire.xls.common import * # Create a Workbook object wb = Workbook() # Load an existing Excel file wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Data.xlsx") # Get the first worksheet sheet = wb.Worksheets[0] # Get the cell range containing data locatedRange = sheet.AllocatedRange # Iterate through the rows for i in range(len(sheet.Rows)): # Iterate through the columns for j in range(len(locatedRange.Rows[i].Columns)): # Get data of a specific cell print(locatedRange[i + 1, j + 1].Value + " ", end='') print("")
Read Value Rather than Formula in a Cell in Python
As mentioned earlier, when a cell contains a formula, the CellRange.Value property returns the formula itself, not the value of the formula. If you want to get the value, you need to use the str(CellRange.FormulaValue) method. The following are the steps to read value rather than formula in an Excel cell in Python.
- Instantiate Workbook class
- Load an Excel document using LoadFromFile method.
- Get a specific worksheet using Workbook.Worksheets[index] property.
- Get a specific cell using Worksheet.Range property.
- Determine if the cell has a formula using CellRange.HasFormula property.
- Get the formula value of the cell using str(CellRange.FormulaValue) method
- Python
from spire.xls import * from spire.xls.common import * # Create a Workbook object wb = Workbook() # Load an Excel file wb.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Formula.xlsx") # Get a specific worksheet sheet = wb.Worksheets[0] # Get a specific cell certainCell = sheet.Range["D4"] # Determine if the cell has formula if(certainCell.HasFormula): # Get the formula value of the cell print(str(certainCell.FormulaValue))
Conclusion
In this blog post, we learned how to read data from cells, cell regions, and worksheets in Python with the help of Spire.XLS for Python API. We also discussed how to determine if a cell has a formula and how to get the value of the formula. This library supports extraction of many other elements in Excel such as images, hyperlinks and OEL objects. Check out our online documentation for more tutorials. If you have any questions, please contact us by email or on the forum.
Python: Convert PDF to PDF/A and Vice Versa
PDF/A is a specialized format designed specifically for long-term archiving and preservation of electronic documents. It guarantees that the content, structure, and visual appearance of the documents remain unchanged over time. By converting PDF files to PDF/A format, you ensure the long-term accessibility of the documents, regardless of software, operating systems, or future technological advancements. Conversely, converting PDF/A files to standard PDF format makes it easier to edit, share, and collaborate on the documents, ensuring better compatibility across different applications, devices, and platforms. In this article, we will explain how to convert PDF to PDF/A and vice versa in Python using Spire.PDF for Python.
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python in VS Code
Convert PDF to PDF/A in Python
The PdfStandardsConverter class provided by Spire.PDF for Python supports converting PDF to various PDF/A formats, including PDF/A-1a, 2a, 3a, 1b, 2b and 3b. Moreover, it also supports converting PDF to PDF/X-1a:2001. The detailed steps are as follows.
- Specify the input file path and output folder.
- Create a PdfStandardsConverter object and pass the input file path to the constructor of the class as a parameter.
- Convert the input file to a Pdf/A-1a conformance file using PdfStandardsConverter.ToPdfA1A() method.
- Convert the input file to a Pdf/A-1b file using PdfStandardsConverter.ToPdfA1B() method.
- Convert the input file to a Pdf/A-2a file using PdfStandardsConverter.ToPdfA2A() method.
- Convert the input file to a Pdf/A-2b file using PdfStandardsConverter.ToPdfA2B() method.
- Convert the input file to a Pdf/A-3a file using PdfStandardsConverter.ToPdfA3A() method.
- Convert the input file to a Pdf/A-3b file using PdfStandardsConverter.ToPdfA3B() method.
- Convert the input file to a PDF/X-1a:2001 file using PdfStandardsConverter.ToPdfX1A2001() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Specify the input file path and output folder inputFile = "Sample.pdf" outputFolder = "Output/" # Create an object of the PdfStandardsConverter class converter = PdfStandardsConverter(inputFile) # Convert the input file to PdfA1A converter.ToPdfA1A(outputFolder + "ToPdfA1A.pdf") # Convert the input file to PdfA1B converter.ToPdfA1B(outputFolder + "ToPdfA1B.pdf") # Convert the input file to PdfA2A converter.ToPdfA2A(outputFolder + "ToPdfA2A.pdf") # Convert the input file to PdfA2B converter.ToPdfA2B(outputFolder + "ToPdfA2B.pdf") # Convert the input file to PdfA3A converter.ToPdfA3A(outputFolder + "ToPdfA3A.pdf") # Convert the input file to PdfA3B converter.ToPdfA3B(outputFolder + "ToPdfA3B.pdf") # Convert the input file to PDF/X-1a:2001 converter.ToPdfX1A2001(outputFolder + "ToPdfX1a.pdf")
Convert PDF/A to PDF in Python
To convert a PDF/A file back to a standard PDF format, you need to create a new standard PDF file, and then draw the page content of the PDF/A file to the newly created PDF file. The detailed steps are as follows.
- Create a PdfDocument object.
- Load a PDF/A file using PdfDocument.LoadFromFile() method.
- Create a PdfNewDocument object and set its compression level as none.
- Loop through the pages in the original PDF/A file.
- Add pages to the newly created PDF using PdfDocumentBase.Pages.Add() method.
- Draw the page content of the original PDF/A file to the corresponding pages of the newly created PDF using PdfPageBase.CreateTemplate.Draw() method.
- Create a Stream object and then save the new PDF to the stream using PdfNewDocument.Save() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Specify the input and output file paths inputFile = "Output/ToPdfA1A.pdf" outputFile = "PdfAToPdf.pdf" # Create an object of the PdfDocument class doc = PdfDocument() # Load a PDF file doc.LoadFromFile(inputFile) # Create a new standard PDF file newDoc = PdfNewDocument() newDoc.CompressionLevel = PdfCompressionLevel.none # Add pages to the newly created PDF and draw the page content of the loaded PDF onto the corresponding pages of the newly created PDF for i in range(doc.Pages.Count): page = doc.Pages.get_Item(i) size = page.Size p = newDoc.Pages.Add(size, PdfMargins(0.0)) page.CreateTemplate().Draw(p, 0.0, 0.0) # Save the new PDF to a PDF file fileStream = Stream(outputFile) newDoc.Save(fileStream) fileStream.Close() newDoc.Close(True)
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Spire.OCR for Java Released: Welcome to Try It Out!
We are thrilled to announce the launch of Spire.OCR for Java, an OCR (Optical Character Recognition) product designed for efficient text extraction from images. Developed by E-iceblue, Spire.OCR for Java is now available to empower developers with seamless integration of OCR functionalities into their Java applications.
Spire.OCR for Java
Spire.OCR for Java is a professional OCR library to read text from Images in JPG, PNG, GIF, BMP and TIFF formats. Developers can easily add OCR functionalities on Java applications (J2SE and J2EE). It supports commonly used image formats and provides functionalities like reading multiple characters and fonts from images, bold and italic styles and much more.
Contact Us
- Support Team: support@e-iceblue.com
- Sales Team: sales@e-iceblue.com
- Skype ID: iceblue.support
Spire.PDFViewer 7.12.3 supports the interface zoom effect using Ctrl + scroll wheel in WinForm projects
We are pleased to announce the release of Spire.PDFViewer 7.12.3. This version supports the interface zoom effect by using Ctrl + scroll wheel in WinForm projects. What’s more, it also fixes the issue that text content could not be displayed. More details are listed below.
Here is a list of changes made in this release
Category | ID | Description |
New feature | SPIREPDFVIEWER-579 | Supports the interface zoom effect by using Ctrl + scroll wheel in WinForm projects.
this.KeyPreview = true; this.KeyDown += new System.Windows.Forms.KeyEventHandler(this.Form1_KeyDown); this.KeyUp += new System.Windows.Forms.KeyEventHandler(Form1_KeyUp); this.MouseWheel += new System.Windows.Forms.MouseEventHandler(Form1_MouseWheel); private bool m_PressCtrl = false; private float m_ZoomFactor = 1.0f; private void Form1_KeyDown(object sender, KeyEventArgs e) { m_PressCtrl = e.Control; } private void Form1_KeyUp(object sender, KeyEventArgs e) { m_PressCtrl = false; } private float[] array = new float[] { 0.5f, 0.75f, 1f, 1.25f, 1.5f, 2f, 4f }; private int index = 2; private void Form1_MouseWheel(object sender, MouseEventArgs e) { if (m_PressCtrl) { if (e.Delta > 0) { index = index < 6 ? index + 1 : 6; } if (e.Delta < 0) { index = index == 0 ? 0 : index - 1; } this.pdfViewer1.SetZoomFactor(array[index]); } } |
Bug | SPIREPDFVIEWER -577 | Fixes the issue that text content could not be displayed. |
How to Scan and Recognize Text from Images in Java Projects
OCR (Optical Character Recognition) technology is the primary method to extract text from images. Spire.OCR for Java provides developers with a quick and efficient solution to scan and extract text from images in Java projects. This article will guide you on how to use Spire.OCR for Java to recognize and extract text from images in Java projects.
Obtaining Spire.OCR for Java
To scan and recognize text in images using Spire.OCR for Java, you need to first import the Spire.OCR.jar file along with other relevant dependencies into your Java project.
You can download Spire.OCR for Java from our website. If you are using Maven, you can add the following code to your project's pom.xml file to import the JAR file into your application.
<repositories> <repository> <id>com.e-iceblue</id> <name>e-iceblue</name> <url>https://repo.e-iceblue.cn/repository/maven-public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>e-iceblue</groupId> <artifactId>spire.ocr</artifactId> <version>1.9.0</version> </dependency> </dependencies>
Please download the other dependencies based on your operating system:
Install Dependencies
Step 1: Create a Java project in IntelliJ IDEA.
Step 2: Go to File > Project Structure > Modules > Dependencies in the menu and add Spire.OCR.jar as a project dependency.
Step 3: Download and extract the other dependency files. Copy all the files from the extracted "dependencies" folder to your project directory.
Scanning and Recognizing Text from a Local Image
- Java
import com.spire.ocr.OcrScanner; import java.io.*; public class ScanLocalImage { public static void main(String[] args) throws Exception { // Specify the path to the dependency files String dependencies = "dependencies/"; // Specify the path to the image file to be scanned String imageFile = "data/Sample.png"; // Specify the path to the output file String outputFile = "ScanLocalImage_out.txt"; // Create an OcrScanner object OcrScanner scanner = new OcrScanner(); // Set the dependency file path for the OcrScanner object scanner.setDependencies(dependencies); // Use the OcrScanner object to scan the specified image file scanner.scan(imageFile); // Get the scanned text content String scannedText = scanner.getText().toString(); // Create an output file object File output = new File(outputFile); // If the output file already exists, delete it if (output.exists()) { output.delete(); } // Create a BufferedWriter object to write content to the output file BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile)); // Write the scanned text content to the output file writer.write(scannedText); // Close the BufferedWriter object to release resources writer.close(); } }
Specify the Language File to Scan and Recognize Text from an Image
- Java
import com.spire.ocr.OcrScanner; import java.io.*; public class ScanImageWithLanguageSelection { public static void main(String[] args) throws Exception { // Specify the path to the dependency files String dependencies = "dependencies/"; // Specify the path to the language file String languageFile = "data/japandata"; // Specify the path to the image file to be scanned String imageFile = "data/JapaneseSample.png"; // Specify the path to the output file String outputFile = "ScanImageWithLanguageSelection_out.txt"; // Create an OcrScanner object OcrScanner scanner = new OcrScanner(); // Set the dependency file path for the OcrScanner object scanner.setDependencies(dependencies); // Load the specified language file scanner.loadLanguageFile(languageFile); // Use the OcrScanner object to scan the specified image file scanner.scan(imageFile); // Get the scanned text content String scannedText = scanner.getText().toString(); // Create an output file object File output = new File(outputFile); // If the output file already exists, delete it if (output.exists()) { output.delete(); } // Create a BufferedWriter object to write content to the output file BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile)); // Write the scanned text content to the output file writer.write(scannedText); // Close the BufferedWriter object to release resources writer.close(); } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert PDF to HTML
HTML is the standard markup language for web pages. Converting PDF documents to HTML format allows you to embed them directly into web pages, making them accessible and viewable in web browsers without requiring additional software or plugins. In this article, we will demonstrate how to convert PDF to HTML in Python using Spire.PDF for Python.
- Convert PDF to HTML in Python
- Set Conversion Options When Converting PDF to HTML in Python
- Convert PDF to HTML Stream in Python
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python in VS Code
Convert PDF to HTML in Python
The PdfDocument.SaveToFile() method provided by Spire.PDF for Python allows you to convert a PDF document to HTML format. The detailed steps are as follows.
- Create an object of the PdfDocument class.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Save the PDF document to HTML format using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create an object of the PdfDocument class doc = PdfDocument() # Load a PDF document doc.LoadFromFile("Sample.pdf") # Save the PDF document to HTML format doc.SaveToFile("PdfToHtml.html", FileFormat.HTML) doc.Close()
Set Conversion Options When Converting PDF to HTML in Python
The SetPdfToHtmlOptions() method of the PdfConvertOptions class enables you to specify the conversion options when converting PDF files to HTML. This method accepts the following parameters:
- useEmbeddedSvg (bool): Indicates whether to embed SVG in the resulting HTML file.
- useEmbeddedImg (bool): Indicates whether to embed images in the resulting HTML file. This option is applicable only when useEmbeddedSvg is set to False.
- maxPageOneFile (bool): Specifies the maximum number of pages to be included per HTML file. This option is applicable only when useEmbeddedSvg is set to False.
- useHighQualityEmbeddedSvg (bool): Indicates whether to use high-quality embedded SVG in the resulting HTML file. This option is applicable when useEmbeddedSvg is set to True.
The following steps explain how to specify conversion options when converting PDF to HTML.
- Create an object of the PdfDocument class.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Get the PdfConvertOptions object using PdfDocument.ConvertOptions property.
- Specify the PDF to HTML conversion options using PdfConvertOptions.SetPdfToHtmlOptions() method.
- Save the PDF document to HTML format using PdfDocument.SaveToFile() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create an object of the PdfDocument class doc = PdfDocument() # Load a PDF document doc.LoadFromFile("Sample.pdf") # Set the conversion options to embed images in the resulting HTML and limit one page per HTML file pdfToHtmlOptions = doc.ConvertOptions pdfToHtmlOptions.SetPdfToHtmlOptions(False, True, 1, False) # Save the PDF document to HTML format doc.SaveToFile("PdfToHtmlWithCustomOptions.html", FileFormat.HTML) doc.Close()
Convert PDF to HTML Stream in Python
In addition to converting a PDF document to an HTML file, you can save it to an HTML stream by using the PdfDocument.SaveToStream() method. The detailed steps are as follows.
- Create an object of the PdfDocument class.
- Load a PDF document using PdfDocument.LoadFromFile() method.
- Create an object of the Stream class.
- Save the PDF document to an HTML stream using PdfDocument.SaveToStream() method.
- Python
from spire.pdf.common import * from spire.pdf import * # Create an object of the PdfDocument class doc = PdfDocument() # Load a PDF document doc.LoadFromFile("Sample.pdf") # Save the PDF document to HTML stream fileStream = Stream("PdfToHtmlStream.html") doc.SaveToStream(fileStream, FileFormat.HTML) fileStream.Close() doc.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Python: Convert Word to RTF and Vice Versa
RTF is a flexible file format that preserves formatting and basic styling while offering compatibility with various word processing software. Converting Word to RTF enables users to retain document structure, fonts, hyperlinks, and other essential elements without the need for specialized software. Similarly, converting RTF back to Word format provides the flexibility to edit and enhance documents using the powerful features of Microsoft Word. In this article, you will learn how to convert Word to RTF and vice versa in Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code
Convert Word to RTF in Python
With Spire.Doc for Python, you can load a Word file using the Document.LoadFromFile() method and convert it to a different format, such as RTF, using the Document.SaveToFile() method; Conversely, you can load an RTF file in the same way and save it as a Word file.
The following are the steps to convert Word to RTF using Spire.Doc for Python.
- Create a Document object.
- Load a Word file using Document.LoadFromFile() method.
- Convert it to an RTF file using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Word file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.docx") # Convert to a RTF file document.SaveToFile("output/ToRtf.rtf", FileFormat.Rtf) document.Close()
Convert RTF to Word in Python
The code for converting RTF to Word is quite simply, too. Follow the steps below.
- Create a Document object.
- Load an RTF file using Document.LoadFromFile() method.
- Convert it to a Word file using Document.SaveToFile() method.
- Python
from spire.doc import * from spire.doc.common import * # Create a Document object document = Document() # Load a Rtf file document.LoadFromFile("C:\\Users\\Administrator\\Desktop\\input.rtf") # Convert to a Word file document.SaveToFile("output/ToWord.docx", FileFormat.Docx2019) document.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Spire.XLS for Java 13.12.12 supports retrieving embedded images added with WPS tools
We are delighted to announce the release of Spire.XLS for Java 13.12.12. This version supports retrieving embedded images added with WPS tools. Besides, it enhances the conversion from XLSM to PDF. Moreover, some known issues are fixed successfully in this version, such as the issue that the program suspended when loading an Excel document. More details are listed below.
Here is a list of changes made in this release
Category | ID | Description |
New feature | SPIREXLS-4971 | Adds the worksheet.getCellImages() method to retrieve embedded images added with WPS tools.
Workbook workbook = new Workbook(); workbook.loadFromFile("sample.xlsx"); Worksheet sheet = workbook.getWorksheets().get(0); ExcelPicture[] picture = sheet.getCellImages(); for (int i = 0; i < picture.length; i++) { ExcelPicture ep = picture[i]; BufferedImage image = ep.getPicture(); ImageIO.write(image,"PNG", new File(outputFile + String.format("pic_%d.png",i))); } |
Bug | SPIREXLS-4971 | Fixes the issue that the program threw an exception "Index is less than 0 or more than or equal to the list count." when getting inline pictures added with WPS tools. |
Bug | SPIREXLS-4996 | Fixes the issue that the program suspended when loading Excel documents. |
Bug | SPIREXLS-5010 | Fixes the issue that the font size of the retrieved text was incorrect. |
Bug | SPIREXLS-5021 | Fixes the issue that the coordinate axes data of charts in the saved Excel document were incorrect. |
Bug | SPIREXLS-5024 | Fixes the issue that the program threw a java.lang.StringIndexOutOfBoundsException when converting an XLSM document to PDF. |