Hello,
In our code, we need exceptional performance from Spire.Doc, where it has to parse each of millions of users' documents. The most important thing is that we don't know the format of the user's document (document type). It can be simply anything, so we have to try to parse it, and if anything comes out, we will process it for storing in an index. The idea is to have all of the text that user's document has
We get the document as a binary stream from SQL.
With spire.doc.dll version 3.5.1, I was using following code to accomplish my needs:
...
Document document = new Document();
try
{
document.LoadFromStream(binaryStream, FileFormat.Auto);
result = document.GetText();
}
catch
{
//Do nothing about documents that couldn't be parsed.
}
finally
{
document.Close();
binaryStream.Close();
}
..
It has some memory flaws, which makes memory commit size to pump up from 200mb to 72gb!, but we increased windows swap file size to compansate it.
I upgraded to version 3.7.8 today. After upgrading, the code above didn't compile. While looking to compile errors, I noticed that two of the spire.doc's only functionalities that I have been using are missing! . Those are;
1- error CS0117: 'Spire.Doc.FileFormat' does not contain a definition for 'Auto'
2- error CS1061: 'Spire.Doc.Document' does not contain a definition for 'GetText' and no extension method 'GetText' accepting a first argument of type 'Spire.Doc.Document' could be found
Note that these are the only methods we purchased spire.doc for (automatic detecting of files and getting only the text, excluding format). It has been doing its job rather well so far. How should I accomplish the same business in 3.7.8 without increasing memory usage?
Anyone can help?