Java: Extract Text from PowerPoint

A PowerPoint presentation, developed by Microsoft Corporation, is a versatile file format used for creating visually captivating and interactive content. It includes rich features and multiple elements such as text and images, making it a powerful tool for various scenarios, such as business introductions and academic speeches. If you need to edit or manipulate the text of PowerPoint, programmatically extracting it and saving it to a new file is an effective approach. In this article, we will show you how to extract text  from PowerPoint using Spire.Presentation for Java.

Install Spire.Presentation for Java

First of all, you're required to add the Spire.Presentation.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.presentation</artifactId>
        <version>9.2.2</version>
    </dependency>
</dependencies>
    

Extract Text from the Whole PowerPoint File

Spire.Presentation for Java supports looping through all slides and extracting text from the paragraphs on each slide using ParagraphEx.getText() method. The detailed steps are as follows.

  • Create an object of Presentation class.
  • Load a sample presentation using Presentation.loadFromFile() method.
  • Create an object of StringBuilder class.
  • Loop through shapes in each slide and paragraphs in each shape.
  • Extract all text from these slides by calling ParagraphEx.getText() method and append the extracted text to StringBuilder object.
  • Create an object of FileWriter class, and write the extracted text to a new .txt file.
  • Java
import com.spire.presentation.*;

import java.io.*;

public class ExtractText {
    public static void main(String[] args) throws Exception {

        //Create an object of Presentation class
        Presentation presentation = new Presentation();

        //Load a sample presentation
        presentation.loadFromFile("sample.pptx");

        //Create a  StringBuilder object
        StringBuilder buffer = new StringBuilder();

        //Loop through each slide and extract text 
        for (Object slide : presentation.getSlides()) {
            for (Object shape : ((ISlide) slide).getShapes()) {
                if (shape instanceof IAutoShape) {
                    for (Object tp : ((IAutoShape) shape).getTextFrame().getParagraphs()) {
                        buffer.append(((ParagraphEx) tp).getText()+"\n");
                    }
                }
            }
        }

        //Write the extracted text to a new .txt file
        FileWriter writer = new FileWriter("output/ExtractAllText.txt");
        writer.write(buffer.toString());
        writer.flush();
        writer.close();
        presentation.dispose();
    }
}

Java: Extract Text from PowerPoint

Extract Text from the Specific Slide

Spire.Presentation for Java also supports users to extract text from the specific slide. Simply get the desired slide by calling Presentation.getSlides().get() method before extracting text from the paragraphs on it. The following are detailed steps.

  • Create an object of Presentation class.
  • Load a sample presentation using Presentation.loadFromFile() method.
  • Create an object of StringBuilder class.
  • Get the first slide of this file by calling Presentation.getSlides().get() method.
  • Loop through each shape and the paragraphs in each shape.
  • Extract the text from the first slide by calling ParagraphEx.getText() method and append the extracted text to StringBuilder object.
  • Create an object of FileWriter class, and write the extracted text to a new .txt file.
  • Java
import com.spire.presentation.*;

import java.io.*;

public class ExtractText {
    public static void main(String[] args) throws Exception {

        //Create an object of Presentation class
        Presentation presentation = new Presentation();

        //Load a sample presentation
        presentation.loadFromFile("sample.pptx");

        //Create a StringBuilder object
        StringBuilder buffer = new StringBuilder();

        //Get the first slide of the presentation
        ISlide Slide = presentation.getSlides().get(0);

        //Loop through each paragraphs in each shape and extract text
        for (Object shape : Slide.getShapes()) {
            if (shape instanceof IAutoShape) {
                for (Object tp : ((IAutoShape) shape).getTextFrame().getParagraphs()) {
                    buffer.append(((ParagraphEx) tp).getText()+"\n");
                }
            }
        }

        //Write the extracted text to a new .txt file
        FileWriter writer = new FileWriter("output/ExtractSlideText.txt");
        writer.write(buffer.toString());
        writer.flush();
        writer.close();
        presentation.dispose();
    }
}

Java: Extract Text from PowerPoint

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.