Knowledgebase (2018)
Python: Add or Extract Audio and Video from PowerPoint Documents
2024-02-21 00:56:54 Written by support iceblueAdding or extracting audio and video in a PowerPoint document can greatly enrich the presentation content, enhance audience engagement, and improve comprehension. By adding audio, you can include background music, narration, or sound effects to make the content more lively and emotionally engaging. Inserting videos allows you to showcase dynamic visuals, demonstrate processes, or explain complex concepts, helping the audience to understand the content more intuitively. Extracting audio and video can help preserve important information or resources for reuse when needed. This article will introduce how to use Python and Spire.Presentation for Python to add or extract audio and video in PowerPoint.
- Add Audio in PowerPoint Documents in Python
- Extract Audio from PowerPoint Documents in Python
- Add Video in PowerPoint Documents in Python
- Extract Video from PowerPoint Documents in Python
Install Spire.Presentation for Python
This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Presentation
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows
Add Audio in PowerPoint Documents in Python
Spire.Presentation for Python provides the Slide.Shapes.AppendAudioMedia() method, which can be used to add audio files to slides. The specific steps are as follows:
- Create an object of the Presentation class.
- Use the RectangleF.FromLTRB() method to create a rectangle.
- In the shapes collection of the first slide, use the Slide.Shapes.AppendAudioMedia() method to add the audio file to the previously created rectangle.
- Use the Presentation.SaveToFile() method to save the document as a PowerPoint file.
- Python
from spire.presentation.common import * from spire.presentation import * # Create a presentation object presentation = Presentation() # Create an audio rectangle audioRect = RectangleF.FromLTRB(200, 150, 310, 260) # Add audio presentation.Slides[0].Shapes.AppendAudioMedia("data/Music.wav", audioRect) # Save the presentation to a file presentation.SaveToFile("AddAudio.pptx", FileFormat.Pptx2016) # Release resources presentation.Dispose()
Extract Audio from PowerPoint Documents in Python
To determine if a shape is of audio type, you can check if its type is IAudio. If the shape is of audio type, you can use the IAudio.Data property to retrieve audio data. The specific steps are as follows:
- Create an object of the Presentation class.
- Use the Presentation.LoadFromFile() method to load the PowerPoint document.
- Iterate through the shapes collection on the first slide, checking if each shape is of type IAudio.
- If the shape is of type IAudio, use IAudio.Data property to retrieve the audio data from the audio object.
- Use the AudioData.SaveToFile() method to save the audio data to a file.
- Python
from spire.presentation.common import * from spire.presentation import * # Create a presentation object presentation = Presentation() # Load a presentation from a file presentation.LoadFromFile("Audio.pptx") # Initialize a counter i = 1 # Iterate through shapes in the first slide for shape in presentation.Slides[0].Shapes: # Check if the shape is of audio type if isinstance(shape, IAudio): # Get the audio data and save it to a file AudioData = shape.Data AudioData.SaveToFile("ExtractAudio_"+str(i)+".wav") i = i + 1 # Release resources presentation.Dispose()
Add Video in PowerPoint Documents in Python
Using the Slide.Shapes.AppendVideoMedia() method, you can add video files to slides. The specific steps are as follows:
- Create an object of the Presentation class.
- Use the RectangleF.FromLTRB() method to create a rectangle.
- In the shapes collection of the first slide, use the Slide.Shapes.AppendVideoMedia() method to add the video file to the previously created rectangle.
- Use the video.PictureFill.Picture.Url property to set the cover image of the video.
- Use the Presentation.SaveToFile() method to save the document as a PowerPoint file.
- Python
from spire.presentation.common import * from spire.presentation import * # Create a presentation object presentation = Presentation() # Create a video rectangle videoRect = RectangleF.FromLTRB(200, 150, 450, 350) # Add video video = presentation.Slides[0].Shapes.AppendVideoMedia("data/Video.mp4", videoRect) video.PictureFill.Picture.Url = "data/Video.png" # Save the presentation to a file presentation.SaveToFile("AddVideo.pptx", FileFormat.Pptx2016) # Release resources presentation.Dispose()
Extract Video from PowerPoint Documents in Python
The video type is IVideo. If the shape is of type IVideo, you can use the IVideo.EmbeddedVideoData property to retrieve video data. The specific steps are as follows:
- Create an object of the Presentation class.
- Use the Presentation.LoadFromFile() method to load the PowerPoint presentation.
- Iterate through the shapes collection on the first slide, checking if each shape is of type IVideo.
- If the shape is of type IVideo, use the IVideo.EmbeddedVideoData property to retrieve the video data from the video object.
- Use the VideoData.SaveToFile() method to save the video data to a file.
- Python
from spire.presentation.common import * from spire.presentation import * # Create a presentation object presentation = Presentation() # Load a presentation from a file presentation.LoadFromFile("Video.pptx") # Initialize a counter i = 1 # Iterate through each slide in the presentation for slide in presentation.Slides: # Iterate through shapes in each slide for shape in slide.Shapes: # Check if the shape is of video type if isinstance(shape, IVideo): # Get the video data and save it to a file VideoData = shape.EmbeddedVideoData VideoData.SaveToFile("ExtractVideo_"+str(i)+".avi") i = i + 1 # Release resources presentation.Dispose()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
OLE (Object Linking and Embedding) objects in Word are files or data from other applications that can be inserted into a document. These objects can be edited and updated within Word, allowing you to seamlessly integrate content from various programs, such as Excel spreadsheets, PowerPoint presentations, or even multimedia files like images, audio, or video. In this article, we will introduce how to insert and extract OLE objects in a Word document in Python using Spire.Doc for Python.
Install Spire.Doc for Python
This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.Doc
If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows
Insert OLE Objects in Word in Python
Spire.Doc for Python provides the Paragraph.AppendOleObject(pathToFile:str, olePicture:DocPicture, type:OleObjectType) method to embed OLE objects in a Word document. The detailed steps are as follows.
- Create an object of the Document class.
- Load a Word document using the Document.LoadFromFile() method.
- Get a specific section using the Document.Sections.get_Item(index) method.
- Add a paragraph to the section using the Section.AddParagraph() method.
- Create an object of the DocPicture class.
- Load an image that will be used as the icon of the OLE object using the DocPicture.LoadImage() method and then set image width and height.
- Append an OLE object to the paragraph using the Paragraph.AppendOleObject(pathToFile:str, olePicture:DocPicture, type:OleObjectType) method.
- Save the result file using the Document.SaveToFile() method.
The following code example shows how to embed an Excel spreadsheet, a PDF file, and a PowerPoint presentation in a Word document using Spire.Doc for Python:
- Python
from spire.doc import * from spire.doc.common import * # Create an object of the Document class doc = Document() # Load a Word document doc.LoadFromFile("Example.docx") # Get the first section section = doc.Sections.get_Item(0) # Add a paragraph to the section para1 = section.AddParagraph() para1.AppendText("Excel File: ") # Load an image which will be used as the icon of the OLE object picture1 = DocPicture(doc) picture1.LoadImage("Excel-Icon.png") picture1.Width = 50 picture1.Height = 50 # Append an OLE object (an Excel spreadsheet) to the paragraph para1.AppendOleObject("Budget.xlsx", picture1, OleObjectType.ExcelWorksheet) # Add a paragraph to the section para2 = section.AddParagraph() para2.AppendText("PDF File: ") # Load an image which will be used as the icon of the OLE object picture2 = DocPicture(doc) picture2.LoadImage("PDF-Icon.png") picture2.Width = 50 picture2.Height = 50 # Append an OLE object (a PDF file) to the paragraph para2.AppendOleObject("Report.pdf", picture2, OleObjectType.AdobeAcrobatDocument) # Add a paragraph to the section para3 = section.AddParagraph() para3.AppendText("PPT File: ") # Load an image which will be used as the icon of the OLE object picture3 = DocPicture(doc) picture3.LoadImage("PPT-Icon.png") picture3.Width = 50 picture3.Height = 50 # Append an OLE object (a PowerPoint presentation) to the paragraph para3.AppendOleObject("Plan.pptx", picture3, OleObjectType.PowerPointPresentation) doc.SaveToFile("InsertOLE.docx", FileFormat.Docx2013) doc.Close()
Extract OLE Objects from Word in Python
To extract OLE objects from a Word document, you first need to locate the OLE objects within the document. Once located, you can determine the file format of each OLE object. Finally, you can save the data of each OLE object to a file in its native file format. The detailed steps are as follows.
- Create an instance of the Document class.
- Load a Word document using the Document.LoadFromFile() method.
- Iterate through all sections of the document.
- Iterate through all child objects in the body of each section.
- Identify the paragraphs within each section.
- Iterate through the child objects in each paragraph.
- Locate the OLE object within the paragraph.
- Determine the file format of the OLE object.
- Save the data of the OLE object to a file in its native file format.
The following code example shows how to extract the embedded Excel spreadsheet, PDF file, and PowerPoint presentation from a Word document using Spire.Doc for Python:
- Python
from spire.doc import * from spire.doc.common import * # Create an object of the Document class doc = Document() # Load a Word document doc.LoadFromFile("InsertOLE.docx") i = 1 # Iterate through all sections of the Word document for k in range(doc.Sections.Count): sec = doc.Sections.get_Item(k) # Iterate through all child objects in the body of each section for j in range(sec.Body.ChildObjects.Count): obj = sec.Body.ChildObjects.get_Item(j) # Check if the child object is a paragraph if isinstance(obj, Paragraph): par = obj if isinstance(obj, Paragraph) else None # Iterate through the child objects in the paragraph for m in range(par.ChildObjects.Count): o = par.ChildObjects.get_Item(m) # Check if the child object is an OLE object if o.DocumentObjectType == DocumentObjectType.OleObject: ole = o if isinstance(o, DocOleObject) else None s = ole.ObjectType # Check if the OLE object is a PDF file if s.startswith("AcroExch.Document"): ext = ".pdf" # Check if the OLE object is an Excel spreadsheet elif s.startswith("Excel.Sheet"): ext = ".xlsx" # Check if the OLE object is a PowerPoint presentation elif s.startswith("PowerPoint.Show"): ext = ".pptx" else: continue # Write the data of OLE into a file in its native format with open(f"Output/OLE{i}{ext}", "wb") as file: file.write(ole.NativeData) i += 1 doc.Close()
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
In addition to text and images, PDF files can also contain various types of attachments, such as documents, images, audio files, or other multimedia elements. Extracting attachments from PDF files allows users to retrieve and save the embedded content, enabling easy access and manipulation outside of the PDF environment. This process proves especially useful when dealing with PDFs that contain important supplementary materials, such as reports, spreadsheets, or legal documents.
In this article, you will learn how to extract attachments from a PDF document in Python using Spire.PDF for Python.
- Extract Document-Level Attachments from PDF in Python
- Extract Annotation Attachments from PDF in Python
Install Spire.PDF for Python
This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.PDF
If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows
Prerequisite Knowledge
There are generally two categories of attachments in PDF files: document-level attachments and annotation attachments. Below, you can find a table outlining the disparities between these two types of attachments and how they are represented in Spire.PDF.
Attachment type | Represented by | Definition |
Document level attachment | PdfAttachment class | A file attached to a PDF at the document level won't appear on a page, but can be viewed in the "Attachments" panel of a PDF reader. |
Annotation attachment | PdfAnnotationAttachment class | A file attached as an annotation can be found on a page or in the "Attachments" panel. An annotation attachment is shown as a paper clip icon on the page; reviewers can double-click the icon to open the file. |
Extract Document-Level Attachments from PDF in Python
To retrieve document-level attachments in a PDF document, you can use the PdfDocument.Attachments property. Each attachment has a PdfAttachment.FileName property, which provides the name of the specific attachment, including the file extension. Additionally, the PdfAttachment.Data property allows you to access the attachment's data. To save the attachment to a specific folder, you can utilize the PdfAttachment.Data.Save() method.
The steps to extract document-level attachments from a PDF using Python are as follows.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Get a collection of attachments using PdfDocument.Attachments property.
- Iterate through the attachments in the collection.
- Get a specific attachment from the collection, and get the file name and data of the attachment using PdfAttachment.FileName property and PdfAttachment.Data property.
- Save the attachment to a specified folder using PdfAttachment.Data.Save() method.
- Python
from spire.pdf import * from spire.pdf.common import * # Create a PdfDocument object doc = PdfDocument() # Load a PDF file doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Attachments.pdf") # Get the attachment collection from the document collection = doc.Attachments # Loop through the collection if collection.Count > 0: for i in range(collection.Count): # Get a specific attachment attactment = collection.get_Item(i) # Get the file name and data of the attachment fileName= attactment.FileName data = attactment.Data # Save it to a specified folder data.Save("Output\\ExtractedFiles\\" + fileName) doc.Close()
Extract Annotation Attachments from PDF in Python
The Annotations attachment is a page-based element. To retrieve annotations from a specific page, use the PdfPageBase.AnnotationsWidget property. You then need to determine if a particular annotation is an attachment. If it is, save it to the specified folder while retaining its original filename.
The following are the steps to extract annotation attachments from a PDF using Python.
- Create a PdfDocument object.
- Load a PDF file using PdfDocument.LoadFromFile() method.
- Iterate though the pages in the document.
- Get the annotations from a particular page using PdfPageBase.AnnotationsWidget property.
- Iterate though the annotations, and determine if a specific annotation is an attachment annotation.
- If it is, get the file name and data of the annotation using PdfAttachmentAnnotation.FileName property and PdfAttachmentAnnotation.Data property.
- Save the annotated attachment to a specified folder.
- Python
from spire.pdf import * from spire.pdf.common import * # Create a PdfDocument object doc = PdfDocument() # Load a PDF file doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\AnnotationAttachment.pdf") # Iterate through the pages in the document for i in range(doc.Pages.Count): # Get a specific page page = doc.Pages.get_Item(i) # Get the annotation collection of the page annotationCollection = page.AnnotationsWidget # If the page has annotations if annotationCollection.Count > 0: # Iterate through the annotations for j in range(annotationCollection.Count): # Get a specific annotation annotation = annotationCollection.get_Item(j) # Determine if the annotation is an attachment annotation if isinstance(annotation, PdfAttachmentAnnotationWidget): # Get the file name and data of the attachment fileName = annotation.FileName byteData = annotation.Data streamMs = Stream(byteData) # Save the attachment into a specified folder streamMs.Save("Output\\ExtractedFiles\\" + fileName)
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.