How to Convert HTML to XML in C#, VB.NET

Office OpenXML becomes the technology of choice for delivering structured data on the Web, working hand-in-hand with HTML and fully complementing HTML. Consequently, we need to convert HTML to Office OpenXML at some point at work. This article mainly talks about the conversion process through a professional Word .NET library Spire.Doc.

First we need to complete the preparatory work before the procedure:

  • Download the Spire.Doc and install it on your machine.
  • Add the Spire.Doc.dll files as reference.
  • Open bin folder and select the three dll files under .NET 4.0.
  • Right click property and select properties in its menu.
  • Set the target framework as .NET 4.
  • Add Spire.Doc as namespace.

The following steps will show you how to do this with ease:

Step 1: Create a Word document.

Document doc = new Document();

Step 2: Load the HTML file.


Step 3: Save the HTML as the XML file.

doc.SaveToFile("test.xml", FileFormat.Xml);

Here comes to the full C# and VB.NET code

static void Main(string[] args)
 Document doc = new Document();
 doc.SaveToFile("test.xml", FileFormat.Xml);
Shared Sub Main(ByVal args() As String)
 Dim doc As New Document()
 doc.SaveToFile("test.xml", FileFormat.Xml)
End Sub

Preview of original HTML file.


Preview of generated Office OpenXML file.