Spire.XLS is a professional Excel API that enables developers to create, manage, manipulate, convert and print Excel worksheets. Get free and professional technical support for Spire.XLS for .NET, Java, Android, C++, Python.

Mon Feb 13, 2017 9:08 pm

I am having a performance issue when I am trying to generate an excel report using a template, then load data from a xml file. Here is my source code:

private void CreateReportWorkBook()
{
xml = XDocument.Load(@"D:\tfs\files\Test.xml");
try
{
if (xml != null)
{
lblMessage.Visible = false;
var rowArray = xml.Descendants("r").ToArray();
_rows = rowArray.Count();

//Reads the Excel template file into a fileStream and pass it as parameter to workbook constructor.

if (string.IsNullOrEmpty(_templatePath) || !File.Exists(_templatePath))
throw new Exception("Templete file not found.");

_fileStream = new FileStream(_templatePath, FileMode.Open, FileAccess.Read);
_myWorkBook = new Workbook();
_myWorkBook.LoadFromStream(_fileStream);

_myWorkSheet = _myWorkBook.Worksheets[0];
_myWorkSheet.Name = _sheetName;

//read data row by row from xml file and insert into new excel file
for (int i = 0; i < _rows; i++)
{
//As the first row (index:0) in the excel template contains the report Heading. Hence, adding 1 to the index number and starting from next row.
int rowNumber = _isDocumentTypePDF ? i + 3 : i + 1;
int rowCount = _isDocumentTypePDF ? _myWorkSheet.Range.RowCount + 2 : _myWorkSheet.Range.RowCount;
//Gets the row count from the template : If the count is less than or equal to the total row number of the xml, then it inserts a new row into excel file.
if (rowCount <= rowNumber)
{
_myWorkSheet.InsertRow(rowNumber + 1, 1, InsertOptionsType.FormatAsAfter);
}
//Gets total number of columns from xml file.
int columnsCount = rowArray[i].Nodes().Count();
if (_myWorkSheet.Columns.Count() < columnsCount)
{
columnsCount = _myWorkSheet.Columns.Count();
}
for (int j = 0; j < columnsCount; j++)
{
var value = rowArray[i].Element("c" + (j + 1)).Value;
//Checks if the Data Format received from the template file is of type "Number", thats holds only numeric data, and applies equivalent NPOI Data Format for the particular cell.
if (_myWorkSheet.Rows[rowNumber].Cells[j].NumberFormat.Contains("0") && Regex.Matches(value, @"[a-zA-Z]").Count == 0 && (int.TryParse(value, out _tempInt) || double.TryParse(value, out _tempDouble) || float.TryParse(value, out _tempFloat)))
{
var value1 = _tempInt > 0 ? _tempInt : _tempFloat > 0 ? _tempFloat : _tempDouble > 0 ? _tempDouble : 0;
_myWorkSheet.Rows[rowNumber].Cells[j].NumberValue = value1;
}
//Checks if the Data Format for the cell is of type 'Date', and applies equivalent NPOI Data Format for the particular cell.
else if (_myWorkSheet.Rows[rowNumber].Cells[j].NumberFormat.Contains("d") && Regex.Matches(value, @"[a-zA-Z]").Count == 0 && !string.IsNullOrEmpty(value))
{
var date = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss");
var dateVal = date.Parse(value);
_myWorkSheet.Rows[rowNumber].Cells[j].DateTimeValue = dateVal;
}
else
{
if (!string.IsNullOrEmpty(value))
{
_myWorkSheet.Rows[rowNumber].Cells[j].Text = value;
}
}
}
}
_ms = new MemoryStream();
_myWorkBook.SaveToStream(_ms,Spire.Xls.FileFormat.Version2007);

if (_isDocumentTypePDF)
GeneratePDFReportDocument();
}
else
{
throw new Exception(Constants.REPORT_DATA_XMLFILE_NOT_AVAILABLE);
}
}
catch (Exception ex)
{
string message = string.Format("{0} <br/> Error Message: {1}", Constants.REPORT_EXCEL_WORKBOOK_EXCEPTION, ex.Message);
throw new Exception(message);
}
}

The issue is: it is very slow processing the data. I have a source xml file with 560 rows of data into, it takes about 1 min 45 seconds to run through the above code to generate a excel file. Could you please give me some insight on what am I doing wrong?
I attached both template file and the xml data source file with this post. Please feel free to take a look.
Thanks!
Attachments
Files.zip
Template and source data Xml file
(35.2 KiB) Downloaded 334 times

lsong
 
Posts: 36
Joined: Wed Feb 01, 2017 4:13 pm

Tue Feb 14, 2017 6:26 am

Dear lsong,

Thanks for your inquiry.
I have tested the code with the latest Spire.XLS Pack Hotfix Version:7.11.49, it takes about 12 second. Please try to use this version. In addition, I have noticed there is only 122 rows in xml file, but you said it has 560 rows, is the sample xml wrong ?
I change little code as something missing, and here is my entire testing code.
Code: Select all
        public static bool _isDocumentTypePDF = false;
        public static int tempInt ;
        public static double _tempDouble;
        public static float _tempFloat;
       static public void CreateReportWorkBook9752()
        {
            System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            stopwatch.Start();
            var xml = XDocument.Load(@"F:\testing\xls test form\sample document\Test.xml");
                if (xml != null)
                {
                    var rowArray = xml.Descendants("r").ToArray();
                    var _rows = rowArray.Count();

                    var _fileStream = new FileStream(@"F:\testing\xls test form\sample document\ActiveParticipantsTemplate.xlsx", FileMode.Open, FileAccess.Read);
                    var _myWorkBook = new Workbook();
                     _myWorkBook.LoadFromStream(_fileStream);

                    var _myWorkSheet = _myWorkBook.Worksheets[0];
                    _myWorkSheet.Name ="MySheet";

                    //read data row by row from xml file and insert into new excel file
                    for (int i = 0; i < _rows; i++)
                    {
                        //As the first row (index:0) in the excel template contains the report Heading. Hence, adding 1 to the index number and starting from next row.
                        int rowNumber = _isDocumentTypePDF ? i + 3 : i + 1;
                        int rowCount = _isDocumentTypePDF ? _myWorkSheet.Range.RowCount + 2 : _myWorkSheet.Range.RowCount;
                        //Gets the row count from the template : If the count is less than or equal to the total row number of the xml, then it inserts a new row into excel file.
                        if (rowCount <= rowNumber)
                        {
                            _myWorkSheet.InsertRow(rowNumber + 1, 1, InsertOptionsType.FormatAsAfter);
                        }
                        //Gets total number of columns from xml file.
                        int columnsCount = rowArray[i].Nodes().Count();
                        if (_myWorkSheet.Columns.Count() < columnsCount)
                        {
                            columnsCount = _myWorkSheet.Columns.Count();
                        }
                        for (int j = 0; j < columnsCount; j++)
                        {
                            var value = rowArray[i].Element("c" + (j + 1)).Value;
                            //Checks if the Data Format received from the template file is of type "Number", thats holds only numeric data, and applies equivalent NPOI Data Format for the particular cell.
                            if (_myWorkSheet.Rows[rowNumber].Cells[j].NumberFormat.Contains("0") && Regex.Matches(value, @"[a-zA-Z]").Count == 0 && (int.TryParse(value, out tempInt) || double.TryParse(value, out _tempDouble) || float.TryParse(value, out _tempFloat)))
                            {
                                var value1 = tempInt > 0 ? tempInt : _tempFloat > 0 ? _tempFloat : _tempDouble > 0 ? _tempDouble : 0;
                                _myWorkSheet.Rows[rowNumber].Cells[j].NumberValue = value1;
                            }
                            //Checks if the Data Format for the cell is of type 'Date', and applies equivalent NPOI Data Format for the particular cell.
                            else if (_myWorkSheet.Rows[rowNumber].Cells[j].NumberFormat.Contains("d") && Regex.Matches(value, @"[a-zA-Z]").Count == 0 && !string.IsNullOrEmpty(value))
                            {
                                _myWorkSheet.Rows[rowNumber].Cells[j].DateTimeValue = DateTime.Now;
                            }
                            else
                            {
                                if (!string.IsNullOrEmpty(value))
                                {
                                    _myWorkSheet.Rows[rowNumber].Cells[j].Text = value;
                                }
                            }
                        }             
                    }
                    _myWorkBook.SaveToFile("9752.xlsx",ExcelVersion.Version2010);
                    _myWorkBook.SaveToFile("result9752.pdf",FileFormat.PDF);
                }
                stopwatch.Stop();
                TimeSpan timespan = stopwatch.Elapsed;
                //it takes about 12s.
                double seconds = timespan.TotalSeconds;
        }


Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Tue Feb 14, 2017 3:44 pm

Thanks for your reply. I sent you a wrong data file in the previous post. I am attaching the right data file with 576 rows of data with this post now. I ran through it using your test code with the latest Spire.XLS Pack Hotfix Version:7.11.49, And it took about 130 seconds. I also ran the same code with the data file I sent you yesterday which has 122 rows of data, I got about 12 seconds run time which is the same as your result. Could you please try with the data file attached to see why it takes so long to process 576 rows of data?
BTW, If you need to compile a new Hot fix for the Spire.Office, can you please make sure it’s publish on NuGet package? That way we can get it directly and apply it to our project.

Thanks!
Lucy
Attachments
Files.zip
(38.43 KiB) Downloaded 283 times

lsong
 
Posts: 36
Joined: Wed Feb 01, 2017 4:13 pm

Wed Feb 15, 2017 3:07 am

Dear Lucy,

Thanks for the information.
I have tested the new files with the latest Spire.Office Platinum (Hot Fix) Version:2.15.4, it takes about 110 seconds. And I have posted the issue to our Dev team. Once there is any progress, we will let you know soon. And if it needs to compile a new hotfix of Spire.Office, we will update it on NuGet and then inform you.
Sorry for inconvenience caused.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Thu Feb 16, 2017 8:28 am

Dear Lucy,

After further investigation,we found the solution. Please add the below line into your code.
Code: Select all
CellRange[] Cells = _myWorkSheet.Rows[rowNumber].Cells;

And then use Cells[j] to replace the code( _myWorkSheet.Rows[rowNumber].Cells[j]), it needn't traverse the row again to get the cell. Here is entire code for your reference.
Code: Select all
       static public void CreateReportWorkBook9752()
        {
            System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();
            stopwatch.Start();
            var xml = XDocument.Load(@"F:\testing\xls test form\sample document\Test2.xml");
                if (xml != null)
                {
                    var rowArray = xml.Descendants("r").ToArray();
                    var _rows = rowArray.Count();

                    var _fileStream = new FileStream(@"F:\testing\xls test form\sample document\ActiveParticipantsTemplate2.xlsx", FileMode.Open, FileAccess.Read);
                    var _myWorkBook = new Workbook();
                     _myWorkBook.LoadFromStream(_fileStream);

                    var _myWorkSheet = _myWorkBook.Worksheets[0];
                    _myWorkSheet.Name ="MySheet";

                    //read data row by row from xml file and insert into new excel file
                    for (int i = 0; i < _rows; i++)
                    {
                        //As the first row (index:0) in the excel template contains the report Heading. Hence, adding 1 to the index number and starting from next row.
                        int rowNumber = _isDocumentTypePDF ? i + 3 : i + 1;
                        int rowCount = _isDocumentTypePDF ? _myWorkSheet.Range.RowCount + 2 : _myWorkSheet.Range.RowCount;
                        //Gets the row count from the template : If the count is less than or equal to the total row number of the xml, then it inserts a new row into excel file.
                        if (rowCount <= rowNumber)
                        {
                            _myWorkSheet.InsertRow(rowNumber + 1, 1, InsertOptionsType.FormatAsAfter);
                        }
                        //Gets total number of columns from xml file.
                        int columnsCount = rowArray[i].Nodes().Count();
                        if (_myWorkSheet.Columns.Count() < columnsCount)
                        {
                            columnsCount = _myWorkSheet.Columns.Count();
                        }
                        //add the line here.
                        CellRange[] Cells = _myWorkSheet.Rows[rowNumber].Cells;
                        for (int j = 0; j < columnsCount; j++)
                        {
                            var value = rowArray[i].Element("c" + (j + 1)).Value;
                            //Checks if the Data Format received from the template file is of type "Number", thats holds only numeric data, and applies equivalent NPOI Data Format for the particular cell.
                            if (Cells[j].NumberFormat.Contains("0") && Regex.Matches(value, @"[a-zA-Z]").Count == 0 && (int.TryParse(value, out tempInt) || double.TryParse(value, out _tempDouble) || float.TryParse(value, out _tempFloat)))
                            {
                                var value1 = tempInt > 0 ? tempInt : _tempFloat > 0 ? _tempFloat : _tempDouble > 0 ? _tempDouble : 0;
                                Cells[j].NumberValue = value1;
                            }
                            //Checks if the Data Format for the cell is of type 'Date', and applies equivalent NPOI Data Format for the particular cell.
                            else if (Cells[j].NumberFormat.Contains("d") && Regex.Matches(value, @"[a-zA-Z]").Count == 0 && !string.IsNullOrEmpty(value))
                            {
                                Cells[j].DateTimeValue = DateTime.Now;
                            }
                            else
                            {
                                if (!string.IsNullOrEmpty(value))
                                {
                                    Cells[j].Text = value;
                                }
                            }
                        }             
                    }
                    _myWorkBook.SaveToFile("9752-2.xlsx",ExcelVersion.Version2010);
                    _myWorkBook.SaveToFile("result9752-2.pdf",FileFormat.PDF);
                }
                stopwatch.Stop();
                TimeSpan timespan = stopwatch.Elapsed;
                double seconds = timespan.TotalSeconds;
        }

If there is any question, please let me know.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Thu Feb 16, 2017 2:49 pm

That fixed the issue. Thank you so much for your help!

Lucy

lsong
 
Posts: 36
Joined: Wed Feb 01, 2017 4:13 pm

Fri Feb 17, 2017 1:40 am

Dear Lucy,

Thanks for your feedback.
Welcome to write to us for further question. We will be happy to help you.

Sincerely,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Wed Sep 13, 2017 3:36 pm

Hi,
Sorry to bring up this old issue again. Now we have larger files coming in, the total number of records are around 10000 to 50000. When we generate excel file with 10000 records, it takes about 4 to 5 minutes which is too long for website operation. Do you have any idea on how to optimize the process to make it much faster?

I am also trying use InsertDataTable to insert data into worksheet. The statement is as follows:
_myWorkSheet.InsertDataTable(dt, false, 5, 1, true);

Since our template has 4 rows, we insert data starting at 5th row. However, the output excel file always has the first data row empty. Please see the attached screenshot, you will find data doesn't start with the row, it starts with the 6th row instead. Do you know how to resolve that?


Thanks,
Lucy
Attachments
AFContributionReportTemplate.zip
(33.81 KiB) Downloaded 278 times
Capture.PNG
Capture.PNG (55.95 KiB) Viewed 4446 times

lsong
 
Posts: 36
Joined: Wed Feb 01, 2017 4:13 pm

Thu Sep 14, 2017 8:13 am

Dear Lucy,

Thanks for your inquiry.
For the time issue about larger files, I am afraid it is difficult to make it much faster using the previous code because of a lot of data.

And I tried InsertDataTable method with the latest Spire.Office Platinum (Hot Fix) Version:2.16.12, but I didn't reproduce your issue, and it started at 5th row. Maybe you could try that version, and here is my testing code.
Code: Select all
       public static void InsertData11587()
       {
           Workbook workbook = new Workbook();
           workbook.LoadFromFile(@"F:\AFContributionReportTemplate.xlsx");
           Worksheet sheet = workbook.Worksheets[0];
           DataTable dt = GetData();
           sheet.InsertDataTable(dt, false, 5, 1, true);
           workbook.SaveToFile("11587.xlsx",ExcelVersion.Version2013);
       }
       private static DataTable GetData()
       {
           DataTable dt = new DataTable();
           dt.Columns.Add("column1", typeof(string));
           dt.Columns.Add("column2", typeof(string));
           DataRow row = dt.NewRow();
           row[0] = "test1";
           row[1] = "test2";
           dt.Rows.Add(row);
           return dt;
       }

By the way, I found the InsertDataTable method is much faster than the the way in previous post. So I suggest you use this method.

Thanks,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Mon Sep 18, 2017 8:28 pm

Hi,
I tried InsertDataTable method and am having an issue with column format. We call InsertDataTable as follows:
_myWorkSheet.InsertDataTable(dt, false, rowNumber, 1, false);

So for the columns with "Text" type in the template and number type in data, it displays the data as number format in output excel. It looks to me the template type is overwritten. For example, we have the first column "Participant Number" which has values like "2737070" in the datatable. But it is shown as "2737070.00" in output excel. We want it shown as the type defined in the template which is Text type because we want to keep the leading 0s in the "Participant Number" column. I have attached the template and screenshot of output excel.
We also tried to call InsertDataTable as follows:
_myWorkSheet.InsertDataTable(dt, false, rowNumber, 1, true);
this time the Text columns shows fine but columns with "Currency" and "Percentage" types didn't show with "$"and "%" sign.

Could you please let me know how to solve this?

Thanks,
Lucy
Attachments
ActiveParticipantsTemplate.zip
(43.24 KiB) Downloaded 237 times
Capture.PNG
Capture.PNG (70.49 KiB) Viewed 4405 times

lsong
 
Posts: 36
Joined: Wed Feb 01, 2017 4:13 pm

Tue Sep 19, 2017 2:27 am

Dear Lucy,

Thanks for your inquiry.
Please set number format after inserting data, and here is code to format the column as text.
Code: Select all
            sheet.Columns[0].NumberFormat ="@";

Any question, please let me know.

Thanks,
Betsy
E-iceblue support team
User avatar

Betsy.jiang
 
Posts: 3099
Joined: Tue Sep 06, 2016 8:30 am

Wed Sep 27, 2017 8:50 am

Hello,

Has the issue get resolved? Could you give us some feedback?

Thanks,
Gary
E-iceblue support team
User avatar

Gary.zhang
 
Posts: 1380
Joined: Thu Apr 04, 2013 1:30 am

Return to Spire.XLS