Split a range of pages into separate PDFs by titles

Mon Aug 23, 2021 9:44 am

Hello There,

I'm trying to split particular pages from a PDF based on the text-titles-keyword presents in the pdf, eg: take from a page that has the title "TitleToStartSplitting" till you find "TittleToStopSplitting" then stop and generate the first pdf then another split from the 2nd title to the 3rd title and add it to a separate pdf and so on since we can't rely on the pages numbers or counts or split by other logic and we need to create new documents with specific pages content

Any help will be much appreciated and thank you in advance

Code is written in Asp Core 5-c#
Spire version licensed

My attempt is based on a question that exists in Spire-Forum

Code: Select all: PdfTextFind[] findResult = null; List<int> urlBorrower = new List<int>(); foreach (PdfPageBase page in pdf.Pages) { //Find text findResult = page.FindText("TitleOne", TextFindParameter.CrossLine).Finds; foreach (PdfTextFind find in findResult) { int pageindex = find.SearchPageIndex; urlBorrower.Add(pageindex); } var lender = page.FindText("TitleTwo", TextFindParameter.CrossLine).Finds; var title3 = page.FindText("Title3", TextFindParameter.CrossLine).Finds; var title4 = page.FindText("title4", TextFindParameter.CrossLine).Finds; if(title3.Length > 0 || title4.Length>0 || lender.Length > 0) break; } PdfDocument newpdf = new PdfDocument(); for (int i = 0; i < urlBorrower.Count; i++) { int currentIndex = urlBorrower[i]; int nextIndex = 0; if (i.Equals(urlBorrower.Count - 1)) { nextIndex = pdf.Pages.Count; } else { nextIndex = urlBorrower[i + 1]; } for (int j = currentIndex; j < nextIndex; j++) { PdfPageBase page = pdf.Pages[j]; PdfPageBase newPage = newpdf.Pages.Add(page.ActualSize, new PdfMargins(0)); page.CreateTemplate().Draw(newPage.Canvas, new PointF(0, 0)); } newpdf.SaveToFile(@"custom" + i + ".pdf", FileFormat.PDF); } newpdf.Dispose();

Mon Aug 23, 2021 11:54 am

Hello,

Thanks for your inquiry.
Please refer to the following code.

Code: Select all: PdfDocument pdf = new PdfDocument(); pdf.LoadFromFile("input.pdf"); PdfDocument newpdf=null; for (int i=0; i< pdf.Pages.Count;i++) { string pageText = pdf.Pages[i].ExtractText(); if (pageText.Contains("TitleToStartSplitting")) { if(newpdf == null) { newpdf = new PdfDocument(); } else { newpdf.AppendPage().Canvas.DrawTemplate(pdf.Pages[i].CreateTemplate(), new PointF(0, 0)); } } if (pageText.Contains("TittleToStopSplitting")) { if (newpdf != null) { newpdf.AppendPage().Canvas.DrawTemplate(pdf.Pages[i].CreateTemplate(), new PointF(0, 0)); newpdf.SaveToFile(@"custom" + i + ".pdf", FileFormat.PDF); newpdf = null; continue; } } if(newpdf != null) { newpdf.AppendPage().Canvas.DrawTemplate(pdf.Pages[i].CreateTemplate(), new PointF(0, 0)); } }

If there are any other issues related to our products, please feel free to contact us.

Sincerely,
Brian
E-iceblue support team

Mon Aug 23, 2021 12:45 pm

Thank you for your reply.

Sorry, I didn't mention that the "TitleToStartSplitting" may exist as normal text in the pdf not only as sub-title "if this makes difference"

I have tried the snippet you attached but it's duplicating the pages "instead of 7 expected pages in the pdf it generates 15" and the pages are out of order, I was trying as well and figure out a simple solution and it seems to work with me but I believe there is a better way so I attached it to see your opinion-hint

Thank you

I'm calling this function 3 times to split 3 pdfs from the original pdf each call with different StartsplitTitle EndSplitTitle
Logic:

Search for the first time you see SearchFrom and assign startIndex then search for the first time you find endFrom and assign endIndex then loop to extract these pages into separate pdf

Code: Select all: void ExtractByUnmarriedAddendum(PdfDocument pdf, string docTitle, string searchFrom, string searchTo) { var startInd = 0; foreach (PdfPageBase page in pdf.Pages) { var f = page.FindText(searchFrom, TextFindParameter.CrossLine).Finds; if (f.Length > 0) { startInd = f.FirstOrDefault().SearchPageIndex; break; } } var endIndex = 0; foreach (PdfPageBase page in pdf.Pages) { var f = page.FindText(searchTo, TextFindParameter.CrossLine).Finds; if (f.Length > 0) { endIndex = f.FirstOrDefault().SearchPageIndex; break; } } PdfDocument newpdfBo = new PdfDocument(); for (var i = startInd; i < endIndex; i++) { PdfPageBase page = pdf.Pages[i]; PdfPageBase newPage = newpdfBo.Pages.Add(page.ActualSize, new PdfMargins(0)); page.CreateTemplate().Draw(newPage.Canvas, new PointF(0, 0)); } newpdfBo.SaveToFile(@$"Spire\{docTitle}.pdf", FileFormat.PDF); newpdfBo.Dispose(); }

Tue Aug 24, 2021 2:27 am

Hello,

Thanks for your feedback.
To help us investigate your issue more accurately and quickly, please provide your input file and your desired output. You could upload them here or send them to us (support@e-iceblue.com) via email. Thanks in advance.

Sincerely,
Brian
E-iceblue support team

Wed Sep 01, 2021 10:05 am

Hello,

Greetings from E-iceblue!
Has your issue been resolved? If not, could you please provide your input file and your desired output to us for further investigation?

Sincerely,
Brian
E-iceblue support team

Mon Sep 06, 2021 1:35 pm

Hello There,

Sorry for being late to respond, but I was discussing the functionality that was requested with our Business Team and we were able to figure out a solution and I'm able to get it working. Thank you for your help.

Best,
Obaida

Tue Sep 07, 2021 1:18 am

Thanks for your response. And I am glad to hear that you have found a solution.
If you encounter any issues related to our products in the future, please feel free to contact us.

Sincerely,
Brian
E-iceblue support team

Split a range of pages into separate PDFs by titles

Purchase

Partnership

Products

Corporation