How to Convert HTML to XML in C#, VB.NET

Office OpenXML becomes the technology of choice for delivering structured data on the Web, working hand-in-hand with HTML and fully complementing HTML. Consequently, we need to convert HTML to Office OpenXML at some point at work. This article mainly talks about the conversion process through a professional Word .NET library Spire.Doc.

First we need to complete the preparatory work before the procedure:

  • Download the Spire.Doc and install it on your machine.
  • Add the Spire.Doc.dll files as reference.
  • Open bin folder and select the three dll files under .NET 4.0.
  • Right click property and select properties in its menu.
  • Set the target framework as .NET 4.
  • Add Spire.Doc as namespace.

The following steps will show you how to do this with ease:

Step 1: Create a Word document.

[C#]
Document doc = new Document();

Step 2: Load the HTML file.

[C#]
doc.LoadFromFile("Sample.html");

Step 3: Save the HTML as the XML file.

[C#]
doc.SaveToFile("test.xml", FileFormat.Xml);

Here comes to the full C# and VB.NET code

[C#]
using Spire.Doc;

namespace HTMLXML
{
    class Program
    {
        static void Main(string[] args)
        {
            Document doc = new Document();
            doc.LoadFromFile("Sample.html");
            doc.SaveToFile("test.xml", FileFormat.Xml);
        }

 
    }
}
[VB.NET]
Imports Spire.Doc

Namespace HTMLXML
	Class Program
		Private Shared Sub Main(args As String())
			Dim doc As New Document()
			doc.LoadFromFile("Sample.html")
			doc.SaveToFile("test.xml", FileFormat.Xml)
		End Sub


	End Class
End Namespace

Preview of original HTML file.

HTML_effect_screenshot

Preview of generated Office OpenXML file.

XML_effect_screenshot