Spire.PDF is a professional PDF library applied to creating, writing, editing, handling and reading PDF files without any external dependencies. Get free and professional technical support for Spire.PDF for .NET, Java, Android, C++, Python.
Fri Jun 18, 2021 6:48 am
I need to completely remove the text layer from a PDF (the PDF will later be OCRd), is it possible to do this without converting the PDF to images?
Thanks for your inquiry. Does the layer you mentioned similar to the screenshot below (in Adobe)? If so, our Spire.PDF supports removing layers by name, you can refer to the following code to test.
PdfDocument doc = new PdfDocument(); doc.LoadFromFile("Test.pdf"); doc.Layers.RemoveLayer("Layer Name"); doc.SaveToFile("Output.pdf");
Or if I misunderstood, to help us better understand your requirement, please provide your PDF document and the output document you expect for our reference. Thanks in advance.
Thanks for the quick response. The PDF doesn't present the text "layer" in that way - I guess what I'm looking for is the equivalent to the Save as PDF Image File in Adobe, so the existing OCRd text is removed.
Thanks for your inquiry and sorry for late reply on weekend. Our Spire.PDF currently does not support the features you mentioned. I'm afraid you can only convert PDF to images. Apologize for the inconvenience caused.