Thursday, 12 April 2012 06:24

PDF Text Format for Silverlight

The sample demonstrates how to Set PDF Text Format for Silverlight via Spire.PDF.

 

Thursday, 12 April 2012 06:08

Word Table for Silverlight

The sample demonstrates how to Create Table in Word for Silverlight via Spire.Doc.

 

Thursday, 12 April 2012 05:57

Edit Excel in Silverlight

The sample demonstrates how to Edit Excel in Silverlight via Spire.XLS.

 

PDF documents are fixed in layout and do not allow users to perform modifications in them. To make the PDF content editable again, you can convert PDF to Word or extract text from PDF. In this article, you will learn how to extract text from a specific PDF page, how to extract text from a particular rectangle area, and how to extract text by SimpleTextExtractionStrategy in C# and VB.NET using Spire.PDF for .NET.

Install Spire.PDF for .NET

To begin with, you need to add the DLL files included in the Spire.PDF for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.PDF

Extract Text from a Specified Page

The following are the steps to extract text from a certain page of a PDF document using Spire.PDF for .NET.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get the specific page through PdfDocument.Pages[index] property.
  • Create a PdfTextExtractor object.
  • Create a PdfTextExtractOptions object, and set the IsExtractAllText property to true.
  • Extract text from the selected page using PdfTextExtractor.ExtractText() method.
  • Write the extracted text to a TXT file.
  • C#
  • VB.NET
using System;
using System.IO;
using Spire.Pdf;
using Spire.Pdf.Texts;

namespace ExtractTextFromPage
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            //Load a PDF file
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Terms of Service.pdf");

            //Get the second page
            PdfPageBase page = doc.Pages[1];
      
            //Create a PdfTextExtractot object
            PdfTextExtractor textExtractor = new PdfTextExtractor(page);

            //Create a PdfTextExtractOptions object
            PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();

            //Set isExtractAllText to true
            extractOptions.IsExtractAllText = true;

            //Extract text from the page
            string text = textExtractor.ExtractText(extractOptions);

            //Write to a txt file
            File.WriteAllText("Extracted.txt", text);
        }
    }
}

C#/VB.NET: Extract Text from PDF Documents

Extract Text from a Rectangle

The following are the steps to extract text from a rectangle area of a page using Spire.PDF for .NET.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get the specific page through PdfDocument.Pages[index] property.
  • Create a PdfTextExtractor object.
  • Create a PdfTextExtractOptions object, and specify the rectangle area through the ExtractArea property of it.
  • Extract text from the rectangle using PdfTextExtractor.ExtractText() method.
  • Write the extracted text to a TXT file.
  • C#
  • VB.NET
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.IO;
using System.Drawing;

namespace ExtractTextFromRectangleArea
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            //Load a PDF file
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Terms of Service.pdf");

            //Get the second page
            PdfPageBase page = doc.Pages[1];

            //Create a PdfTextExtractot object
            PdfTextExtractor textExtractor = new PdfTextExtractor(page);

            //Create a PdfTextExtractOptions object
            PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();

            //Set the rectangle area
            extractOptions.ExtractArea = new RectangleF(0, 0, 890, 170);

            //Extract text from the rectangle 
            string text = textExtractor.ExtractText(extractOptions);

            //Write to a txt file
            File.WriteAllText("Extracted.txt", text);
        }
    }
}

C#/VB.NET: Extract Text from PDF Documents

Extract Text using SimpleTextExtractionStrategy

The above methods extract text line by line. When extracting text using SimpleTextExtractionStrategy, it keeps track of the current Y position of each string and inserts a line break into the output if the Y position has changed. The following are the detailed steps.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get the specific page through PdfDocument.Pages[index] property.
  • Create a PdfTextExtractor object.
  • Create a PdfTextExtractOptions object and set the IsSimpleExtraction property to true.
  • Extract text from the selected page using PdfTextExtractor.ExtractText() method.
  • Write the extracted text to a TXT file.
  • C#
  • VB.NET
using System.IO;
using Spire.Pdf;
using Spire.Pdf.Texts;

namespace SimpleExtraction
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a PdfDocument object
            PdfDocument doc = new PdfDocument();

            //Load a PDF file
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Invoice.pdf");

            //Get the first page
            PdfPageBase page = doc.Pages[0];

            //Create a PdfTextExtractor object
            PdfTextExtractor textExtractor = new PdfTextExtractor(page);

            //Create a PdfTextExtractOptions object
            PdfTextExtractOptions extractOptions = new PdfTextExtractOptions();

            //Set IsSimpleExtraction to true
            extractOptions.IsSimpleExtraction = true;

            //Extract text from the selected page 
            string text = textExtractor.ExtractText(extractOptions);

            //Write to a txt file
            File.WriteAllText("Extracted.txt", text);
        }
    }
}

C#/VB.NET: Extract Text from PDF Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

After searching so much information about PDF merge, it is easy to find that whether you merge PDF files online or use C#/VB.NET to realize this task, you never escape worrying some important points such as the safety of your PDF file, so much time it costs or whether the merged file supports to print page number and so on. However, as long as you come here, these troubles will not appear. This section will specifically introduce you a secure solution to merge PDF files into one with C#, VB.NET via a .NET PDF component Spire.PDF for .NET.

Spire.PDF for .NET, built from scratch in C#, enables programmers and developers to create, read, write and manipulate PDF documents in .NET applications without using Adobe Acrobat or any external libraries. Using Spire.PDF for .NET, you not only can quickly merge PDF files but also enables you to print PDF page with page number. Now please preview the effective screenshot below:

Merge PDF Documents

Before following below procedure, please download Spire.PDF for .NET and install it on system.

Step1: You can use the String array to save the names of the three PDF files which will be merged into one PDF and demonstrate Spire.Pdf.PdfDocument array. Then, load three PDF files and select the first PdfDocument for the purpose of merging the second and third PDF file to it. In order to import all pages from the second PDF file to the first PDF file, you need to call the method public void AppendPage(PdfDocument doc). Also by calling another method public PdfPageBase InsertPage(PdfDocument doc, int pageIndex),every page of the third PDF file can be imported to the first PDF file.

[C#]
private void button1_Click(object sender, EventArgs e)
        {
            //pdf document list
            String[] files = new String[]
            {
                @"..\PDFmerge0.pdf",
                @"..\ PDFmerge1.pdf",
                @"..\ PDFmerge2.pdf"
            };
            //open pdf documents            
            PdfDocument[] docs = new PdfDocument[files.Length];
            for (int i = 0; i < files.Length; i++)
            {
                docs[i] = new PdfDocument(files[i]);
            }
            //append document
            docs[0].AppendPage(docs[1]);

            //import PDF pages
            for (int i = 0; i < docs[2].Pages.Count; i = i + 2)
            {
                docs[0].InsertPage(docs[2], i);
            }
[VB.NET]
 Private Sub button1_Click(sender As Object, e As EventArgs)
	'pdf document list
	Dim files As [String]() = New [String]() {"..\PDFmerge0.pdf", "..\ PDFmerge1.pdf", "..\ PDFmerge2.pdf"}
	'open pdf documents            
	Dim docs As PdfDocument() = New PdfDocument(files.Length - 1) {}
	For i As Integer = 0 To files.Length - 1
		docs(i) = New PdfDocument(files(i))
	Next

	'append document
	docs(0).AppendPage(docs(1))

	'import PDF pages
	Dim i As Integer = 0
	While i < docs(2).Pages.Count
		docs(0).InsertPage(docs(2), i)
		i = i + 2
	End While

Step2: Draw page number in the first PDF file. In this step, you can set PDF page number margin by invoking the class Spire.Pdf.Graphics. PdfMargins. Then, Call the custom method DrawPageNumber(PdfPageCollection pages, PdfMargins margin, int startNumber, int pageCount) to add page number in the bottom of every page in the first PDF. Please see the detail code below:

[C#]
           //set PDF margin
            PdfUnitConvertor unitCvtr = new PdfUnitConvertor();
            PdfMargins margin = new PdfMargins();
            margin.Top = unitCvtr.ConvertUnits(2.54f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point);
            margin.Bottom = margin.Top;
            margin.Left = unitCvtr.ConvertUnits(3.17f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point);
            margin.Right = margin.Left;
            this.DrawPageNumber(docs[0].Pages, margin, 1, docs[0].Pages.Count);

          private void DrawPageNumber(PdfPageCollection pages, PdfMargins margin, int startNumber, int pageCount)
          {
            foreach (PdfPageBase page in pages)
            {
                page.Canvas.SetTransparency(0.5f);
                PdfBrush brush = PdfBrushes.Black;
                PdfPen pen = new PdfPen(brush, 0.75f);
                PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("Arial", 9f, System.Drawing.FontStyle.Italic), true);
                PdfStringFormat format = new PdfStringFormat(PdfTextAlignment.Right);
                format.MeasureTrailingSpaces = true;
                float space = font.Height * 0.75f;
                float x = margin.Left;
                float width = page.Canvas.ClientSize.Width - margin.Left - margin.Right;
                float y = page.Canvas.ClientSize.Height - margin.Bottom + space;
                page.Canvas.DrawLine(pen, x, y, x + width, y);
                y = y + 1;
                String numberLabel
                    = String.Format("{0} of {1}", startNumber++, pageCount);
                page.Canvas.DrawString(numberLabel, font, brush, x + width, y, format);
                page.Canvas.SetTransparency(1);
            }
        }
[VB.NET]
       'set PDF margin
	Dim unitCvtr As New PdfUnitConvertor()
	Dim margin As New PdfMargins()
	margin.Top = unitCvtr.ConvertUnits(2.54F, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point)
	margin.Bottom = margin.Top
	 margin.Left = unitCvtr.ConvertUnits(3.17F, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point)
	margin.Right = margin.Left
	Me.DrawPageNumber(docs(0).Pages, margin, 1, docs(0).Pages.Count)

       Private Sub DrawPageNumber(pages As PdfPageCollection, margin As PdfMargins, startNumber As Integer, pageCount As Integer)
	For Each page As PdfPageBase In pages
		page.Canvas.SetTransparency(0.5F)
		Dim brush As PdfBrush = PdfBrushes.Black
		Dim pen As New PdfPen(brush, 0.75F)
		Dim font As New PdfTrueTypeFont(New Font("Arial", 9F, System.Drawing.FontStyle.Italic), True)
		Dim format As New PdfStringFormat(PdfTextAlignment.Right)
		format.MeasureTrailingSpaces = True
		Dim space As Single = font.Height * 0.75F
		Dim x As Single = margin.Left
		Dim width As Single = page.Canvas.ClientSize.Width - margin.Left - margin.Right
		Dim y As Single = page.Canvas.ClientSize.Height - margin.Bottom + space
		page.Canvas.DrawLine(pen, x, y, x + width, y)
		y = y + 1
		Dim numberLabel As [String] = [String].Format("{0} of {1}", System.Math.Max(System.Threading.Interlocked.Increment(startNumber),startNumber - 1), pageCount)
		page.Canvas.DrawString(numberLabel, font, brush, x + width, y, format)
		page.Canvas.SetTransparency(1)
	Next
End Sub

The PDF merge code can be very long when you view it at first sight, actually, if you do not need to add page number in your merged PDF, steps two should be avoided. However, in many cases, page number brings great convenience for users to read PDF as well as print it. Spire.PDF for .NET can satisfy both your requirements of merging PDF files and adding page numbers in the merged PDF file.

Thursday, 05 April 2012 06:48

Set PDF Properties in Silverlight

The sample demonstrates how to set PDF properties for Silverlight via Spire.PDF.

 

Thursday, 05 April 2012 06:35

Create Excel Document in Silverlight

The sample demonstrates how to Create Excel file for Silverlight via Spire.XLS.

 

Thursday, 05 April 2012 06:24

Word Bookmark in Silverlight

The sample demonstrates how to add bookmark into Word for Silverlight via Spire.Doc.

 

Thursday, 05 April 2012 03:07

How to Set Word Table Style in C#, VB.NET

Table in Microsoft Word is used to present data information which can assist to explain specified paragraph contents. In order to have a better appearance, people can set Word table style. This guide shows how to use Spire.Doc to set table style in Word with C#/VB.NET.

Download Spire.Doc (or Spire.Office) with .NET Framework 2.0 (or above) together. Once make sure Spire.Doc (or Spire.Office) are correctly installed on system, follow the steps below to set Word table style

In this example, a Word document with table has been prepared. It is a student transcript template from Office.com.

Step 1Create a C#/VB.NET project in Visual Studio. Add Spire.Doc.dll as reference.

[C#]
Document document = new Document();
document.LoadFromFile(@"E:\work\Documents\Student Transcript.docx");
[VB.NET]
Dim document As New Document()
document.LoadFromFile("E:\work\Documents\Student Transcript.docx")

Step 2: Set Table Style

Get table which you want to set style

Because table1 type is different from document.Sections[0].Tables[1] type, so use (Table) to transformed forcibly.

[C#]
Table table1 = (Table)document.Sections[0].Tables[1];
[VB.NET]
Dim table1 As Table = CType(document.Sections(0).Tables(1), Table)

Set table row height.

[C#]
table1.Rows[0].Height = 25;
[VB.NET]
table1.Rows(0).Height = 25

Set Table Style

In order to have distinction. Keep the first cell in first row as before and set style for the second cell. Firstly, set alignment and background color for the second cell. Secondly, declare a paragraph style, including font size, color and apply this style in cell.

[C#]
table1.Rows[0].Cells[1].CellFormat.VerticalAlignment = VerticalAlignment.Middle;
table1.Rows[0].Cells[1].CellFormat.BackColor = Color.LimeGreen;

ParagraphStyle style = new ParagraphStyle(document);
style.Name = "TableStyle";
style.CharacterFormat.FontSize = 14;
style.CharacterFormat.TextColor = Color.GhostWhite;
document.Styles.Add(style);
table1.Rows[0].Cells[1].Paragraphs[0].ApplyStyle(style.Name);
[VB.NET]
table1.Rows(0).Cells(1).CellFormat.VerticalAlignment = VerticalAlignment.Middle
table1.Rows(0).Cells(1).CellFormat.BackColor = Color.LimeGreen

Dim style As New ParagraphStyle(document)
style.Name = "TableStyle"
style.CharacterFormat.FontSize = 14
style.CharacterFormat.TextColor = Color.GhostWhite
document.Styles.Add(style)
table1.Rows(0).Cells(1).Paragraphs(0).ApplyStyle(style.Name)

Step 3: Save and Launch

[C#]
document.SaveToFile("WordTable.docx", FileFormat.Docx);
System.Diagnostics.Process.Start("WordTable.docx");
[VB.NET]
document.SaveToFile("WordTable.docx", FileFormat.Docx)
System.Diagnostics.Process.Start("WordTable.docx")

Effective Screenshot:

Word Table Format

This guide shows how to set Word table style such as size and color via Spire.Doc. However, Spire.Doc can do a lot on operating Word document Click to learn more

Thursday, 01 September 2022 03:22

C#/VB.NET: Sort Data in Excel

Sorting in Excel is one of the most commonly used features in data analysis. It allows users to sort text, numbers, dates and times in ascending, descending or alphabetical order. This article will demonstrate how to programmatically sort data in a cell range using Spire.XLS for .NET.

Install Spire.XLS for .NET

To begin with, you need to add the DLL files included in the Spire.XLS for.NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.

PM> Install-Package Spire.XLS

Sort Data in Excel

The detailed steps are as follows.

  • Create a Workbook instance.
  • Load a sample Excel document using Workbook.LoadFromFile() method.
  • Get the first worksheet using Workbook.Worksheets[index] property.
  • Get a sort fields collection using Workbook.DataSorter.SortColumns property, and then specify the column that need to be sorted and the sort mode in the collection using SortColumns.Add(Int key, SortComparsionType, OrderBy) method.
  • Sort the data in the specified cell range using Workbook.DataSorter.Sort(CellRange range) method.
  • Save the result document using Workbook.SaveToFile() method.
  • C#
  • VB.NET
using Spire.Xls;

namespace SortData
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a Workbook instance
            Workbook workbook = new Workbook();

            //Load a sample Excel document
            workbook.LoadFromFile("sample.xlsx");

            //Get the first worksheet
            Worksheet worksheet = workbook.Worksheets[0];

            //Specify the column that need to be sorted and the sort mode (ascending or descending)
            workbook.DataSorter.SortColumns.Add(0, SortComparsionType.Values, OrderBy.Ascending);

            //Sort data in the specified cell range
            workbook.DataSorter.Sort(worksheet.Range["A1:D10"]);

            //Save the result document
            workbook.SaveToFile("Sort.xlsx", ExcelVersion.Version2016);
        }
    }
}

C#/VB.NET: Sort Data in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.