Get content-type of the embedded image inside Docx

Mon Feb 08, 2021 5:21 am

Hi Team,

We need to read the content type of the embedded images while parsing the DOCX file. This data we need to upload in database for further functionality.

if(node.getDocumentObjectType().equals(DocumentObjectType.Picture)) { // Node is element of paragraph collection
DocPicture picture = (DocPicture) childObj;
// Able to read the image here
}

Please help to get the content-type of any image like png, jgp, bmp etc.

Thank you

Mon Feb 08, 2021 8:42 am

Hello,

Thanks for your inquiry!

Please refer to the following code to get the content type of the embedded images while parsing the DOCX file.

Code: Select all: Document doc = new Document(); doc.loadFromFile("Doc1.docx"); for(int i = 0; i < doc.getSections().getCount(); i++) { Section sec = doc.getSections().get(i); for(int j = 0; j < sec.getParagraphs().getCount(); j++) { Paragraph para = sec.getParagraphs().get(j); for(int k = 0; k < para.getChildObjects().getCount(); k++) { DocumentObject obj = para.getChildObjects().get(k); if (obj.getDocumentObjectType() == DocumentObjectType.Picture) { DocPicture pic = (DocPicture)obj; byte[] imageByte = pic.getImageBytes(); InputStream inputStream = new BufferedInputStream(new ByteArrayInputStream(imageByte)); String mimeType = URLConnection.guessContentTypeFromStream(inputStream); System.out.println(mimeType); } } } }

If the code does not meet your needs, please provide us with your input file for further investigation. Thanks in advance.

Sincerely,
Marcia
E-iceblue support team

Mon Feb 08, 2021 10:52 am

Thank you for the response.

Can we get the file format also to know whether it is .png or .jpg?

Tue Feb 09, 2021 3:51 am

Hello,

Thanks for your feedback!

Please refer to the following code to get the file format.

Code: Select all: Document doc = new Document(); doc.loadFromFile("Doc1.docx"); for(int i = 0; i < doc.getSections().getCount(); i++) { Section sec = doc.getSections().get(i); for(int j = 0; j < sec.getParagraphs().getCount(); j++) { Paragraph para = sec.getParagraphs().get(j); for(int k = 0; k < para.getChildObjects().getCount(); k++) { DocumentObject obj = para.getChildObjects().get(k); if (obj.getDocumentObjectType() == DocumentObjectType.Picture) { DocPicture pic = (DocPicture)obj; byte[] imageByte = pic.getImageBytes(); ByteArrayInputStream byteArrayInputStream = null; MemoryCacheImageInputStream memoryCacheImageInputStream = null; byteArrayInputStream = new ByteArrayInputStream(imageByte); memoryCacheImageInputStream = new MemoryCacheImageInputStream(byteArrayInputStream); Iterator<ImageReader> iterator = ImageIO.getImageReaders(memoryCacheImageInputStream); if (iterator.hasNext()) { ImageReader reader = (ImageReader) iterator.next(); String imageFormat = reader.getFormatName(); System.out.println(imageFormat); }else { System.out.println("image data wrong!"); } } } } }

If you encounter any issues related to our product in the future, just feel free to contact us.

Sincerely,
Marcia
E-iceblue support team

Fri Mar 05, 2021 8:24 am

Hello,

Hope you are doing well!

Has the issue been solved now? Could you please give us some feedback at your convenience?

Thanks in advance.

Sincerely,
Marcia
E-iceblue support team

Get content-type of the embedded image inside Docx

Purchase

Partnership

Products

Corporation