How to Convert Word Doc to XML in C#, VB.NET
Basic Knowledge about Office OpenXML
When talking about Office OpenXML, we may think of HTML. Actually, Office OpenXML is similar to HTML both are tag-based languages. The difference between Office OpenXML and HTML is that the tags which Office OpenXML uses are not predefined. If we want to create own tags within Office OpenXML, we need to follow a few rules.
Firstly, only one root element is contained in Office OpenXML document. The root element is often taken as document element and appears after the prolog section. Besides, all the Office OpenXML elements should contain end tags. Both start and end tag should be identical. Also, the elements can’t overlap. What’s more, all attribute values must use quotation marks and we can’t use some special characters within the text. After following the rules, the Office OpenXML document will be well formatted.
Use C# and VB.NET Convert Doc to Office OpenXML via Spire.Doc
Spire.Doc (Spire.Office) presents you an easy way to convert Doc to Office OpenXML. In this way, we can convert an exist Word doc file to Office OpenXML format with a few clicks. Now, just follow the simple steps.
Step 1: Create Project
Download Spire.Doc and install on system. Create a project through Visual Studio and add Spire.Doc DLL as reference.
Note: Please make sure Spire.Doc and Visual Studio are correctly installed on system
Step 2: Load Word Doc File
Load local Word doc file which we need to convert to Office OpenXML format. The following code can help us load it.
Document document = new Document(); document.LoadFromFile(@"D:\Sample.doc");
Step 3: Convert Doc to Office OpenXML
Spire.Doc supports convert Word Doc files to most of popular file formats such as PDF, HTML, Office OpenXML, EPub, RTF, Dot, Text, etc. Now, use the code below to convert Word to Office OpenXML.
document.SaveToFile("Sample.xml", FileFormat.Xml);
Step 4: Full Code
Now, write the full code into your project and press F5 to start the program.
using System; using System.Windows.Forms; using Spire.Doc; using Spire.Doc.Documents; namespace to XML { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void button1_Click(object sender, EventArgs e) { //Create word document Document document = new Document(); document.LoadFromFile(@"D:\Sample.doc"); //Save doc file. document.SaveToFile("Sample.xml", FileFormat.Xml); //Launching the MS Word file. WordDocViewer("Sample.xml"); } private void WordDocViewer(string fileName) { try { System.Diagnostics.Process.Start(fileName); } catch { } } } }
Imports System Imports System.Windows.Forms Imports Spire.Doc Imports Spire.Doc.Documents Namespace to XML Partial Public Class Form1 Inherits Form Public Sub New() InitializeComponent() End Sub Private Sub button1_Click(ByVal sender As Object, ByVal e As EventArgs) 'Create word document Dim document As New Document() document.LoadFromFile("D:\Sample.doc") 'Save doc file. document.SaveToFile("Sample.xml", FileFormat.Xml); 'Launching the MS Word file. WordDocViewer("Sample.xml") End Sub Private Sub WordDocViewer(ByVal fileName As String) Try System.Diagnostics.Process.Start(fileName) Catch End Try End Sub End Class End Namespace
After running the demo, you may find an Office OpenXML document launched on your browser:
Spire.Doc is an MS Word component which enables user to perform a wide range of Word document processing tasks directly, such as generate, read, write and modify Word document for .NET and Silverlight. Click to Learn more...