Hiding and showing slides are two practical features in PowerPoint that allow you to control the visibility of slides during a slideshow. Hiding slides is useful when you want to skip certain slides or temporarily remove them from the presentation without deleting them. Whereas showing slides is helpful when you want to re-display the hidden slides. In this article, we will demonstrate how to hide and show slides in a PowerPoint presentation in Python using Spire.Presentation for Python.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python in VS Code

Hide a Specific Slide in PowerPoint in Python

Spire.Presentation for Python provides the ISlide.Hidden property to control the visibility of a slide during a slideshow. If you don’t want a certain slide to be shown, you can hide this slide by setting the ISlide.Hidden property as True. The detailed steps are as follows.

  • Create an object of the Presentation class.
  • Load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Get a specific slide using Presentation.Slides[index] property.
  • Hide the slide by setting the ISlide.Hidden property as True.
  • Save the resulting presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create an object of the Presentation class
ppt = Presentation()
# Load a PowerPoint presentation
ppt.LoadFromFile("Sample.pptx")

# Get the second slide and hide it
slide = ppt.Slides[1]
slide.Hidden = True

# Save the resulting presentation to a new .pptx file
ppt.SaveToFile("HideSlide.pptx", FileFormat.Pptx2016)
ppt.Dispose()

Python: Hide or Show Slides in PowerPoint Presentations

Show a Hidden Slide in PowerPoint in Python

To show a hidden slide, you can set the ISlide.Hidden property as False. The detailed steps are as follows.

  • Create an object of the Presentation class.
  • Load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Get a specific slide using Presentation.Slides[index] property.
  • Unhide the slide by setting the ISlide.Hidden property as False.
  • Save the resulting presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create an object of the Presentation class
ppt = Presentation()
# Load a PowerPoint presentation
ppt.LoadFromFile("HideSlide.pptx")

# Get the second slide and unhide it
slide = ppt.Slides[1]
slide.Hidden = False

# Save the resulting presentation to a new .pptx file
ppt.SaveToFile("ShowSlide.pptx", FileFormat.Pptx2016)
ppt.Dispose()

Python: Hide or Show Slides in PowerPoint Presentations

Show All Hidden Slides in PowerPoint in Python

To show all hidden slides in a PowerPoint presentation, you need to loop through all the slides in the presentation, then find the hidden slides and unhide them by setting the ISlide.Hidden property as False. The detailed steps are as follows.

  • Create an object of the Presentation class.
  • Load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Loop through the slides in the presentation.
  • Check whether the current slide is hidden or not using ISlide.Hidden property. If the result is true, unhide the slide by setting the ISlide.Hidden property as False.
  • Save the resulting presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create an object of the Presentation class
ppt = Presentation()
# Load a PowerPoint presentation
ppt.LoadFromFile("Sample2.pptx")

# Loop through each slide in the presentation
for i in range(ppt.Slides.Count):
    slide = ppt.Slides[i]
    # Check if the slide is hidden
    if(slide.Hidden):
        # Unhide the slide
        slide.Hidden = False

# Save the resulting presentation to a new .pptx file
ppt.SaveToFile("ShowAllHidenSlides.pptx", FileFormat.Pptx2016)
ppt.Dispose()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 12 October 2023 02:58

Python: Convert Word to HTML

Converting Word documents to HTML enables easy sharing and publishing of content online. Additionally, HTML content is more search engine friendly, thus converting to HTML also allows search engines to better index and rank your content, increasing its visibility in search results. In this article, you will learn how to programmatically convert Word to HTML using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip commands.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Convert Word Doc/Docx to HTML in Python

Spire.Doc for Python offers the Document.SaveToFile(fileName string, FileFormat.Html) method to simply save a doc or docx document as an HTML file. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document.LoadFromFile() method.
  • Save the document as an HTML file using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *
     
# Create a Document instance
document = Document()

# Load a doc or docx document 
document.LoadFromFile("Statement.docx")

# Save to HTML
document.SaveToFile("WordToHtml.html", FileFormat.Html)
document.Close()

Python: Convert Word to HTML

Convert Word to HTML with Export Options in Python

Spire.Doc for Python also offers the HtmlExportOptions class to set Word to HTML export options during conversion, such as whether to embed CSS styles, images, and whether to export form fields as plain text. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document.LoadFromFile() method.
  • Embed CSS styles during conversion using Document.HtmlExportOptions.CssStyleSheetType property.
  • Set whether to embed images using Document.HtmlExportOptions.ImageEmbedded property.
  • Set whether to export form fields as plain text using Document.HtmlExportOptions.IsTextInputFormFieldAsText property.
  • Save the result document using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document instance
document = Document()

# Load a Word document
document.LoadFromFile("Statement.docx")

# Embed css styles
document.HtmlExportOptions.CssStyleSheetFileName = "sample.css"
document.HtmlExportOptions.CssStyleSheetType = CssStyleSheetType.External

# Set whether to embed images
document.HtmlExportOptions.ImageEmbedded = False
document.HtmlExportOptions.ImagesPath = "Images/"

# Set whether to export form fields as plain text
document.HtmlExportOptions.IsTextInputFormFieldAsText = True

# Save the document as an html file
document.SaveToFile("ToHtmlExportOption.html", FileFormat.Html)
document.Close()

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 12 October 2023 01:07

C++: Change Page Margins in Word

Page margins are the blank spaces at the top, bottom, left, and right edges of a document page. In Word, it may sometimes be quite necessary to adjust the margins to meet the layout requirements of specific documents, such as academic papers, business reports, or creative projects. This article will demonstrate how to programmatically change the page margins of an existing Word document using Spire.Doc for C++.

Install Spire.Doc for C++

There are two ways to integrate Spire.Doc for C++ into your application. One way is to install it through NuGet, and the other way is to download the package from our website and copy the libraries into your program. Installation via NuGet is simpler and more recommended. You can find more details by visiting the following link.

Integrate Spire.Doc for C++ in a C++ Application

Set Page Margins in Word in C++

The MarginsF class provided by Spire.Doc for C++ represents the page margins in Word. To set or change the margins of a Word document, you can use the methods of MarginsF class. The following are the detailed steps.

  • Create a Document object.
  • Load a Word document using Document->LoadFromFile() method.
  • Get a specified section using Document->GetSections()->GetItemInSectionCollection() method.
  • Get the page margins of the section using Section->GetPageSetup()->GetMargins() method.
  • Set the top, bottom, left and right margins for the pages in the section using MarginsF->SetTop(), MarginsF->SetBottom(), MarginsF->SetLeft(), MarginsF->SetRight() methods.
  • Save the result document using Document.SaveToFile() method.
  • C++
#include "Spire.Doc.o.h"

using namespace std;
using namespace Spire::Doc;

int main() {
	//Specify the input and output file paths
	wstring inputFile = L"Data\\Foods.docx";
	wstring outputFile = L"SetMargins.docx";

	//Create a Document instance
	intrusive_ptr<Document> document = new Document();

	//Load a Word document
	document->LoadFromFile(inputFile.c_str());

	//Get the first section
	intrusive_ptr<Section> section = document->GetSections()->GetItemInSectionCollection(0);

	//Set top, bottom, left and right page margins for the section 
	section->GetPageSetup()->GetMargins()->SetTop(38.0f);
	section->GetPageSetup()->GetMargins()->SetBottom(38.0f);
	section->GetPageSetup()->GetMargins()->SetLeft(29.5f);
	section->GetPageSetup()->GetMargins()->SetRight(29.5f);

	//Save the result document
	document->SaveToFile(outputFile.c_str(), FileFormat::Docx2016);
	document->Close();
}

C++: Change Page Margins in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Adding and deleting slides in PowerPoint are essential actions that allow presenters to control the structure and content of their presentations. Adding slides provides the opportunity to expand and enhance the presentation by introducing new topics or providing supporting information. On the other hand, deleting slides helps streamline the presentation by removing redundant, repetitive, or irrelevant content. In this article, we will demonstrate how to add or delete slides in a PowerPoint Presentation in Python using Spire.Presentation for Python.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python in VS Code

Add a New Slide at the End of the PowerPoint Document in Python

Spire.Presentation for Python provides the Presentation.Slides.Append() method to add a new slide after the last slide of a PowerPoint presentation. The detailed steps are as follows.

  • Create an object of the Presentation class.
  • Load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Add a new blank slide at the end of the presentation using Presentation.Slides.Append() method.
  • Save the result presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Add a new slide at the end of the presentation
presentation.Slides.Append()

# Save the result presentation to a .pptx file
presentation.SaveToFile("AddSlide.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Add or Delete Slides in PowerPoint Presentations

Insert a New Slide Before a Specific Slide in PowerPoint in Python

You can use the Presentation.Slides.Insert() method to insert a new slide before a specific slide of a PowerPoint presentation. The detailed steps are as follows.

  • Create an object of the Presentation class.
  • Load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Insert a blank slide before a specific slide using Presentation.Slides.Insert() method.
  • Save the result presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Insert a blank slide before the second slide
presentation.Slides.Insert(1)

# Save the result presentation to a .pptx file
presentation.SaveToFile("InsertSlide.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Add or Delete Slides in PowerPoint Presentations

Delete a Specific Slide from a PowerPoint Document in Python

To delete a specific slide from a PowerPoint presentation, you can use the Presentation.Slides.RemoveAt() method. The detailed steps are as follows.

  • Create an object of the Presentation class.
  • Load a PowerPoint presentation using Presentation.LoadFromFile() method.
  • Remove a specific slide from the presentation using Presentation.Slides.RemoveAt() method.
  • Save the result presentation using Presentation.SaveToFile() method.
  • Python
from spire.presentation.common import *
from spire.presentation import *

# Create a Presentation object
presentation = Presentation()

# Load a PowerPoint presentation
presentation.LoadFromFile("Sample.pptx")

# Remove the first slide
presentation.Slides.RemoveAt(0)

# Save the result presentation to a .pptx file
presentation.SaveToFile("RemoveSlide.pptx", FileFormat.Pptx2013)
presentation.Dispose()

Python: Add or Delete Slides in PowerPoint Presentations

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Tuesday, 10 October 2023 00:51

Python: Set Page Margins for Word Documents

Setting proper margins is an essential step in creating professional Word documents. Margins may seem like a small detail, but they play a vital role in improving the readability and visual appeal of a document. By defining the space around content, margins help maintain a consistent and balanced layout, prevent text from being truncated, and make documents look more organized and aesthetically pleasing. This article will show how to use Spire.Doc for Python to set page margins for Word documents through Python programs.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Set the Page Margins of a Word Document

Spire.Doc for Python provides properties under the Margins class that can be used to set margins for each side of a document separately or to set the same margins for all sides. One important thing to note is that the margins are set based on sections. For consistent margins throughout the document, it is necessary to iterate through each section of the document to set the margins. Below are the detailed steps for setting page margins:

  • Create an object of Document class.
  • Load a Word document using Document.LoadFromFile() method.
  • Loop through the sections of the document.
  • Get a section using Document.Sections.get_Item() method.
  • Get the margins of the section using Section.PageSetup.Margins property.
  • Set the top, bottom, left, and right margin using property under Margins class.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of Document class
doc = Document()

# Load a Word document
doc.LoadFromFile("Sample.docx")

# Loop thorugh the sections of document
for i in range(doc.Sections.Count):
    # Get a section
    section = doc.Sections.get_Item(i)
    # Get the margins of the section
    margins = section.PageSetup.Margins
    # Set the top, bottom, left, and right margins
    margins.Top = 17.9
    margins.Bottom = 17.9
    margins.Left = 20.9
    margins.Right = 20.9
    # margins.All = 17.9

# Save the document
doc.SaveToFile("output/SetPageMargins.docx", FileFormat.Auto)

Python: Set Page Margins for Word Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

We are pleased to announce the release of Spire.XLS for Java 13.10.0. This version supports verifying whether the password for restricted editing is correct. The conversion of Excel to PDF, images and OFD has also been enhanced. Besides, some known issues are fixed successfully in this version, such as the issue that the program threw "Invalid ValidationAlertType string val" when loading an Excel file. More details are listed below.

Here is a list of changes made in this release

Category ID Description
New feature SPIREXLS-4896 Supports verifying whether the password for restricted editing is correct.
worksheet.checkProtectionPassword(String password) 
Bug SPIREXLS-4879 Fixes the issue that the content of the document was incorrect when converting Excel to PDF.
Bug SPIREXLS-4890
SPIREXLS-4908
Fixes the issue that the content in charts was incorrect when converting Excel to images.
Bug SPIREXLS-4893 Fixes the issue that table borders were lost when converting Excel to OFD.
Bug SPIREXLS-4900 Fixes the issue that the program threw "Invalid ValidationAlertType string val" when loading an Excel file.
Bug SPIREXLS-4901 Fixes the issue that pivot table calculated fields couldn’t be added as column fields.
Bug SPIREXLS-4902 Fixes the issue that the names of pivot table calculated fields were automatically prefixed with "Sum of".
Bug SPIREXLS-4910 Fixes the issue that the program threw "java.lang.ClassException" when loading an Excel file.
Click the link below to download Spire.XLS for Java 13.10.0:
Monday, 09 October 2023 01:26

Python: Extract Text from a PDF Document

Extracting text from a PDF document is a process that allows one to retrieve the textual content within a PDF file. PDFs, or Portable Document Format files, are widely used for their ability to preserve the formatting and layout of documents across different platforms. However, extracting text from a PDF can be necessary when you need to work with the text separately, such as analyzing data, conducting research, or converting it into another format. In this article, you will learn how to extract text from a PDF document in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python in VS Code

Extract Text from a Particular Page in Python

The PdfTextExtractor class in Spire.PDF for Python allows you to extract text from a particular page, while the PdfTextExtractOptions class enables you to control the extraction process and define how the text will be extracted. The following are the steps to extract text from a certain page of a PDF document.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get the specific page through PdfDocument.Pages[index] property.
  • Create a PdfTextExtractor object.
  • Create a PdfTextExtractOptions object, and set the IsExtractAllText property to true.
  • Extract text from the selected page using PdfTextExtractor.ExtractText() method.
  • Write the extracted text to a TXT file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile('C:/Users/Administrator/Desktop/Terms of service.pdf')

# Get a specific page
page = doc.Pages[1]

# Create a PdfTextExtractot object
textExtractor = PdfTextExtractor(page)

# Create a PdfTextExtractOptions object
extractOptions = PdfTextExtractOptions()

# Set IsExtractAllText to Ture
extractOptions.IsExtractAllText = True

# Extract text from the page keeping white spaces
text = textExtractor.ExtractText(extractOptions)

# Write text to a txt file 
with open('output/TextOfPage.txt', 'w') as file:
    file.write(text)

Python: Extract Text from a PDF Document

Extract Text from a Rectangle Area in Python

The PdfTextExtactOptions.ExtractArea property specifies a rectangle area from which the text will be extracted. The following are the steps to extract text from a rectangle area of a page using Spire.PDF for Python.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get the specific page through PdfDocument.Pages[index] property.
  • Create a PdfTextExtractor object.
  • Create a PdfTextExtractOptions object, and specify the rectangle area through the ExtractArea property of it.
  • Extract text from the rectangle using PdfTextExtractor.ExtractText() method.
  • Write the extracted text to a TXT file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile('C:/Users/Administrator/Desktop/Terms of service.pdf')

# Get a specific page
page = doc.Pages[1]

# Create a PdfTextExtractot object
textExtractor = PdfTextExtractor(page)

# Create a PdfTextExtractOptions object
extractOptions = PdfTextExtractOptions()

# Set the rectangle area
extractOptions.ExtractArea = RectangleF(0.0, 100.0, 890.0, 80.0)

# Extract text from the rectangle area keeping white spaces
text = textExtractor.ExtractText(extractOptions)

# Write text to a txt file 
with open('output/TextOfRectangle.txt', 'w') as file:
    file.write(text)

Python: Extract Text from a PDF Document

Extract Text from a PDF Document Using Simply Extraction Strategy in Python

The above methods extract text line by line. When extracting text using SimpleExtraction strategy, it keeps track of the current Y position of each string and inserts a line break into the output if the Y position has changed. The following are the detailed steps.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get the specific page through PdfDocument.Pages[index] property.
  • Create a PdfTextExtractor object.
  • Create a PdfTextExtractOptions object and set the IsSimpleExtraction property to true.
  • Extract text from the selected page using PdfTextExtractor.ExtractText() method.
  • Write the extracted text to a TXT file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile('C:/Users/Administrator/Desktop/Invoice.pdf')

# Get a specific page
page = doc.Pages[0]

# Create a PdfTextExtractot object
textExtractor = PdfTextExtractor(page)

# Create a PdfTextExtractOptions object
extractOptions = PdfTextExtractOptions()

# Set IsSimpleExtraction to Ture
extractOptions.IsSimpleExtraction = True

# Extract text from the page using SimpleExtraction strategy
text = textExtractor.ExtractText(extractOptions)

# Write text to a txt file 
with open('output/SimplyExtraction.txt', 'w') as file:
    file.write(text)

Python: Extract Text from a PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

PDF documents, while preserving the special formatting and visual style of content, often present difficulties when it comes to editing, copying, or searching for specific information. However, by extracting text and images from PDF files, users can easily process them or save them in other formats for further use, thus solving the difficulties in editing the content of PDF files. This article will explain how to use Spire.PDF for Python to extract text and images from PDF documents in Python programs.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python in VS Code

Extract Text from PDF Documents

Spire.PDF for Python provides the PdfPageBase.ExtractText() method which can be used to extract all text from a PDF page (including blank space), and return it as a string. The detailed steps for extracting all text from a PDF document are as follows:

  • Create an object of PdfDocument class.
  • Load a PDF document using PdfDocument.LoadFromFile() method.
  • Iterate through the pages of the document, extract the text from the pages using PdfPageBase.ExtractText() method, and write it to a text file.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an instance of the PdfDocument class
pdf = PdfDocument()

# Load the PDF document
pdf.LoadFromFile("Sample.pdf")

# Create a TXT file to save the extracted text
extractedText = open("output/ExtractedText.txt", "w", encoding="utf-8")

# Iterate through the pages of the document
for i in range(pdf.Pages.Count):
    # Get the page
    page = pdf.Pages.get_Item(i)
    # Extract text from the page
    text = page.ExtractText()
    # Write the text to the text file
    extractedText.write(text + "\n")

extractedText.close()
pdf.Close()

Python: Extract Text and Images from PDF Documents

Extract Text from a Rectangular Area of a PDF Page

The PdfPageBase.ExtractText() method also supports extracting text from a rectangular area on a PDF page. The detailed steps are as follows:

  • Create an object of the PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile () method.
  • Get a page using PdfDocument.Pages.get_Item() method.
  • Extract the text from a rectangular area on the page using PdfPageBase.ExtractText() method.
  • Save the extracted text to a text file.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an object of PdfDocument class
pdf = PdfDocument()

# Load a PDF document
pdf.LoadFromFile("Sample.pdf")

# Get the first page
page = pdf.Pages.get_Item(0)

# Extract text from a rectangular area on the page
text = page.ExtractText(RectangleF(90.0, 220.0, 770.0, 130.0))

# Save the extracted text to a text file
extractedText = open("output/ExtractedTextArea.txt", "w", encoding="utf-8")
extractedText.write(text)
extractedText.close()
pdf.Close()

Python: Extract Text and Images from PDF Documents

Extract All the Images from a PDF Document

Spire.PDF for Python also provides the PdfPageBase.ExtractImages() method to extract all the images from a PDF page and return them as a list. The detailed steps for extracting all the images from a PDF document are as follows:

  • Create an object of PdfDocument class.
  • Load a PDF document using the PdfDocument.LoadFromFile() method.
  • Iterate through the pages in the document, extract the images from the pages using PdfPageBase.ExtractImages() method, and put them into a list.
  • Save the images in the list as PNG files.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create an instance of PdfDocument class
pdf = PdfDocument()

# Load the PDF document
pdf.LoadFromFile("Sample.pdf")

# Create a list to store the images
images = []

# Iterate through the pages in the document
for i in range(pdf.Pages.Count):
    # Get a page
    page = pdf.Pages.get_Item(i)
    # Extract the images from the page and store them in the created list
    for img in page.ExtractImages():
        images.append(img)

# Save the images in the list as PNG files
i = 0
for image in images:
    i += 1
    image.Save("output/Images/Image-{0:d}.png".format(i), ImageFormat.get_Png())

pdf.Close()

Python: Extract Text and Images from PDF Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Saturday, 07 October 2023 01:13

Python: Add Bookmarks to a Word Document

Adding bookmarks to Word documents is a useful feature that allows users to mark specific locations within their documents for quick reference or navigation. Bookmarks serve as virtual placeholders, making it easier to find and revisit important sections of a document without scrolling through lengthy pages. In this article, you will learn how to add bookmarks to a Word document in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your VS Code through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python in VS Code

Add Bookmarks to a Paragraph in Python

Spire.Doc for Python offers the BookmarkStart to represent the start of a bookmark and the BookmarkEnd to represent the end of a bookmark. To bookmark a paragraph, a BookmarkStart object is placed at the beginning of the paragraph and a BookmarkEnd object is appended at the end of the paragraph. The following are the detailed steps.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Get a specific paragraph through Document.Sections[index].Paragraphs[index] property.
  • Create a BookmarkStart using Paragraph.AppendBookmarkStart() method and insert it at the beginning of the paragraph using Paragraph.Items.Insert() method.
  • Append a BookmarkEnd at the end of the paragraph using Paragraph.AppendBookmarkEnd() method.
  • Save the document to a different Word file using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a sample Word file
doc.LoadFromFile('C:/Users/Administrator/Desktop/input.docx')

# Get the second paragraph
paragraph = doc.Sections[0].Paragraphs[2]

# Create a bookmark start
start = paragraph.AppendBookmarkStart('myBookmark')

# Insert it at the beginning of the paragraph
paragraph.Items.Insert(0, start)

# Append a bookmark end at the end of the paragraph
paragraph.AppendBookmarkEnd('myBookmark')

# Save the file
doc.SaveToFile('output/AddBookmarkToParagraph.docx', FileFormat.Docx2019)

Python: Add Bookmarks to a Word Document

Add Bookmarks to Selected Text in Python

To bookmark a piece of text, you need first to get the text from the document and get its position inside its owner paragraph. And then place a BookmarkStart before it and a BookmarEnd after it. The detailed steps are as follows.

  • Create a Document object.
  • Load a Word file using Document.LoadFromFile() method.
  • Find the string to be marked from the document.
  • Get its owner paragraph and its position inside the paragraph.
  • Insert a BookmarkStart before the text and a BookmarkEnd after the text.
  • Save the document to a different Word file using Document.SaveToFile() method.
  • Python
from spire.doc import *
from spire.doc.common import *

# Create a Document object
doc = Document()

# Load a sample Word file
doc.LoadFromFile('C:/Users/Administrator/Desktop/input.docx')

# Specify the string to find
stringToFind = 'programming paradigms'

# Find the selected text from the document
finds = doc.FindAllString(stringToFind, False, True)
specificText = finds[0]

# Find the paragraph where the text is located
paragraph = specificText.GetAsOneRange().OwnerParagraph

# Get the index of the text in the paragraph
index = paragraph.ChildObjects.IndexOf(specificText.GetAsOneRange())

# Create a bookmark start
start = paragraph.AppendBookmarkStart("myBookmark")

# Insert the bookmark start at the index position
paragraph.ChildObjects.Insert(index, start)

# Create a bookmark end
end = paragraph.AppendBookmarkEnd("myBookmark")

# Insert the bookmark end at the end of the selected text
paragraph.ChildObjects.Insert(index + 2, end)

# Save the document to a different file
doc.SaveToFile("output/AddBookmarkToSelectedText.docx", FileFormat.Docx2019)

Python: Add Bookmarks to a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Thursday, 28 September 2023 08:06

Spire.Office 8.9.3 is released

We are delighted to announce the release of Spire.Office 8.9.3. In this version, Spire.Presentation 8.9.4 supports setting the time for automatic slide switching as well as setting and reading the transparency and brightness of gradient stop styles; Spire.PDF 9.9.9 enhances the conversion from PDF to images and OFD to PDF; Spire.Doc 11.9.19 enhances the conversion from Word and HTML to PDF. Besides, a lot of known issues are fixed successfully in this version. More details are listed below.

In this version, the most recent versions of Spire.Doc, Spire.PDF, Spire.XLS, Spire.Presentation, Spire.Email, Spire.DocViewer, Spire.PDFViewer, Spire.Spreadsheet, Spire.OfficeViewer, Spire.DataExport, Spire.Barcode are included.

DLL Versions:

  • Spire.Doc.dll v11.9.19
  • Spire.Pdf.dll v9.9.9
  • Spire.XLS.dll v13.9.1
  • Spire.Presentation.dll v8.9.4
  • Spire.Email.dll v6.8.0
  • Spire.DocViewer.Forms.dll v8.7.0
  • Spire.PdfViewer.Forms.dll v7.12.0
  • Spire.PdfViewer.Asp.dll v7.12.0
  • Spire.Spreadsheet.dll v7.4.2
  • Spire.OfficeViewer.Forms.dll v8.9.3
  • Spire.Barcode.dll v7.2.1
  • Spire.DataExport.dll v4.9.0
  • Spire.DataExport.ResourceMgr.dll v2.1.0
Click the link to get the version Spire.Office 8.9.3:
More information of Spire.Office new release or hotfix:

Here is a list of changes made in this release

Spire.Presentation

Category ID Description
New feature SPIREPPT-2351 Supports setting the time for automatic slide switching.
Presentation ppt = new Presentation();
ppt.LoadFromFile("input.pptx");
ppt.Slides[0].SlideShowTransition.AdvanceAfterTime = 1000;
ppt.Slides[1].SlideShowTransition.SelectedAdvanceAfterTime = false;
ppt.SaveToFile("output.pptx", FileFormat.Pptx2013);
ppt.Dispose();
New feature SPIREPPT-2353 Optimizes the names of all options under Radial Gradient Style type, marking the original options as deprecated and adding the same options as in MS PowerPoint tools.
Previous options:
FromCorner1
FromCorner2
FromCorner3
FromCorner4
New options:
FromTopLeftCorner
FromBottomLeftCorner
FromTopRightCorner
FromBottomRightCorner
New feature SPIREPPT-2354 Supports setting and reading the transparency and brightness of the gradient stop styles.
Presentation ppt = new Presentation();
ppt.LoadFromFile("input.pptx");
StringBuilder stringBuilder = new StringBuilder();
IAutoShape shape = (ppt.Slides[0].Shapes[0] as GroupShape).Shapes[2] as IAutoShape;
GradientStopCollection stops = shape.Fill.Gradient.GradientStops;
for (int i = 0; i < stops.Count; i++)
{
    float transparency = stops[i].Color.Transparency;
    float brightness = stops[i].Color.Brightness;
    stringBuilder.AppendLine("stops" + i + "transparency: " + transparency + " brightness: " + brightness);
}
File.WriteAllText("output.txt", stringBuilder.ToString());

stops[0].Color.Transparency = 0.5f;
stops[0].Color.Brightness = -0.32f;
ppt.SaveToFile("output.pptx", FileFormat.Auto);
ppt.Dispose();
Bug SPIREPPT-2322 Fixes the issue that the collection of corner coordinates of polygons obtained was incomplete.
Bug SPIREPPT-2323 Fixes the issue that the text direction changed after saving slides to images.
Bug SPIREPPT-2334 Fix the issue that it failed to retrieve connection point coordinates for line connector shapes.

Spire.PDF

Category ID Description
Bug SPIREPDF-6130 Fixes the issue that the program threw "System.StackOverflowException" when converting PDF to images.
Bug SPIREPDF-6219 Fixes the issue that the program threw "System.ArgumentOutOfRangeException" when drawing HTML content.
Bug SPIREPDF-6229 Fixes the issue that the size of split document was incorrect.
Bug SPIREPDF-6245 Fixes the issue that the XFA checkbox form fields couldn't be filled.
Bug SPIREPDF-6254 Fixes the issue that the program threw "System.FormatException" when converting OFD to PDF.
Bug SPIREPDF-6259 Fixes the issue that a part of content lost when printing PDF files.
Bug SPIREPDF-6272 Fixes the issue that the FontSizeAuto property for textbox form fields was incorrect.

Spire.Doc

Category ID Description
Bug SPIREDOC-9455 Fixes the issue that the content was incorrect after adding a footer copied from another document to a document and then converting it to a PDF document.
Bug SPIREDOC-9466 Fixes the issue that extra shapes appeared after loading a document and saving it as a new document.
Bug SPIREDOC-9699 Fixes the issue that the font of a document changed after updating the fields in the document and converting it to PDF.
Bug SPIREDOC-9743 Fixes the issue that extra pictures appeared after loading a document and saving it as a new document.
Bug SPIREDOC-9767 Fixes the issue that recognizing the Latex formula code "therefore" failed.
Bug SPIREDOC-9800 Fixes the issue that the program threw System.StackOverflowException when loading a document.
Bug SPIREDOC-9833 Fixes the issue that the content was garbled after converting Doc documents to PDF documents.
Bug SPIREDOC-9834 Fixes the issue that the program threw System.NullReferenceException when converting Docx documents to PDF documents.
Bug SPIREDOC-9836 Fixes the issue that the program threw System.NullReferenceException when replacing text.
Bug SPIREDOC-9852 Fixes the issue that extra pictures appeared after replacing text and saving the document to PDF.
Bug SPIREDOC-9861 Fixes the issue that the program failed to recognize the "<" MathML format in HTML content.
Bug SPIREDOC-9869 Fixes the issue that pictures were lost after converting an HTML document to a PDF document.
Bug SPIREDOC-9878 Fixes the issue that the symbols were rotated after converting Docx documents to PDF documents.