Spire.XLS is a professional Excel API that enables developers to create, manage, manipulate, convert and print Excel worksheets. Get free and professional technical support for Spire.XLS for .NET, Java, Android, C++, Python.

Sat Apr 24, 2021 2:50 am

How do I find and remove duplicate rows from a worksheet using spire.xls?
Also,how do I remove extra whitespace from cells that have them?

Jayanta.KMC
 
Posts: 21
Joined: Sat Mar 06, 2021 2:42 pm

Sun Apr 25, 2021 9:41 am

Hello,

Thank you for your inquiry.
Below are my answers to your questions.
1) Please refer to the following code to remove the duplicate rows. If this can't meet your need well, please provide your sample Excel file for a better investigation.
Code: Select all
Workbook workBook = new Workbook();
workBook.LoadFromFile("input.xlsx");
Worksheet sheet = workBook.Worksheets[0];
for (int i = 0; i < sheet.Columns[0].Count - 1; i++)
{
    for (int j = i + 1; j < sheet.Columns[0].Count; j++)
    {
        if (sheet.Columns[0].CellList[i].Value == sheet.Columns[0].CellList[j].Value)
        {
            for (int col = 0; col < sheet.Rows[0].Count; col++)
            {
                if (col == sheet.Rows[0].Count - 1 && sheet.Rows[i].CellList[col].Value == sheet.Rows[j].CellList[col].Value)
                {
                    sheet.DeleteRow(j + 1);
                    j--;
                }
                else if (col != sheet.Rows[0].Count - 1 && sheet.Rows[i].CellList[col].Value == sheet.Rows[j].CellList[col].Value)
                {
                    continue;
                }
                else
                {
                    break;
                }
            }
        }
    }
    workBook.SaveToFile("output.xlsx", FileFormat.Version2016);

2) Do you want to remove the blank cells from sheet? If so, please refer to the code below. If there is any misunderstanding, please provide more detailed information. Thanks in advance.
Code: Select all
            for (int i = 1; i <= sheet.LastRow; i++)
            {
                for (int j = 1; j <= sheet.LastColumn; j++)
                {
                    CellRange range = sheet.Range[i, j];
                    if (range.IsBlank)
                    {
                        sheet.DeleteRange(range, DeleteOption.MoveUp);
                    }
                }
            }


Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Sun Apr 25, 2021 10:31 am

Hi,

Thanks for your reply.

Do you want to remove the blank cells from sheet?


Actually I want to find and remove leading & trailing whitespaces from the cells in a sheet and keep the data, only remove excess whitespaces if any like you know using the Trim() method for strings something like that.

Sincerely,
Jayanta

Jayanta.KMC
 
Posts: 21
Joined: Sat Mar 06, 2021 2:42 pm

Mon Apr 26, 2021 2:05 am

Hello Jayanta,

Thank you for your feedback.
For deleting Leading & Trailing Whitespaces, please refer to the code below. If there is still any question, please provide your sample Excel file as well as the expected output for a better reference.
Code: Select all
           Workbook workBook = new Workbook();
           workBook.LoadFromFile("input.xlsx");
            Worksheet sheet = workBook.Worksheets[0];
            for (int i = 1; i <= sheet.LastRow; i++)
            {
                for (int j = 1; j <= sheet.LastColumn; j++)
                {
                    CellRange range = sheet.Range[i, j];
                    String str = range.Value;
                    String strings = str.Trim();
                    range.Value = strings;     
                }
            }
          workBook.SaveToFile("output.xlsx",FileFormat.Version2016);


Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Thu May 06, 2021 9:07 am

Hello,

Greetings from e-iceblue.
Does the code we provide meet your needs? Could you please give us some feedback at your convenience? Thanks in advance.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Fri May 07, 2021 12:42 am

Hi,

Thanks.

Your code works :D

Jayanta.KMC
 
Posts: 21
Joined: Sat Mar 06, 2021 2:42 pm

Fri May 07, 2021 1:35 am

Hello,

Thank you for your feedback. If you have other questions about using Spire.XLS in the future, please feel free to contact us.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Fri May 21, 2021 4:44 pm

Hi,

Regarding the duplicate row removal code of yours, it is taking a long time to complete I mean a really long time for large files (basically the PC literally hangs), so I tried a different approach as shown below :

Code: Select all
targetWorkbook.LoadFromFile(filePath);
targetSheet = targetWorkbook.Worksheets[0];
var cellRanges = targetSheet.Range["A2:G" + targetSheet.Columns[0].Count];
RemoveDuplicate(targetSheet, cellRanges);


Code: Select all
      private void RemoveDuplicate(Worksheet sheet, CellRange cellRanges)
      {
         // get the duplicated row numbers by 1st column data
         var duplicatedRows = cellRanges.Rows
            .GroupBy(x=> x.Columns[0].Value)
            .Where(x=> x.Count() > 1)
            .Select(x=> x.Last().Columns[0].Row)
            .ToList();
         
         foreach (var element in duplicatedRows)
         {
            sheet.DeleteRow(element);
         }
         
      }


But it isn't doing the job..what am I doing wrong here? Am I getting the last filled row value for 1st column wrong or is there other issues?

Jayanta.KMC
 
Posts: 21
Joined: Sat Mar 06, 2021 2:42 pm

Mon May 24, 2021 7:49 am

Hi,
Thank you for your inquiry.
Please refer to the following code:
Code: Select all
        static void Main(string[] args)
        {           
            Workbook workbook = new Workbook();         
            workbook.LoadFromFile(filePath);
            Worksheet sheet = workbook.Worksheets[0];

            var rangs = sheet.Range[2, 1, sheet.LastRow, sheet.LastColumn];
            RemoveDuplicate(sheet,rangs);
            workbook.SaveToFile("out.xlsx");

        }
        private static void RemoveDuplicate(Worksheet sheet, CellRange cellRanges)
        {
            // get the duplicated row numbers
            var duplicatedRows = cellRanges.Rows
               .GroupBy(x => x.Columns[0].DisplayedText)
                   .Where(x => x.Count() > 1)
                   .SelectMany(x => x.Skip(1))
                   .Select(x => x.Columns[0].Row)
                   .ToList();
            for (int i = 0; i < duplicatedRows.Count; i++)
            {
                sheet.DeleteRow(duplicatedRows[i] - i);
            }
        }

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Mon May 24, 2021 3:59 pm

It is faster than the previous method... But not as fast the Microsoft Interop method of below
Code: Select all
            var excel = new Excel.Application();
            var workbook = excel.Workbooks.Open(@"C:\Test\Test.xlsx");
            Excel.Worksheet worksheet = workbook.Sheets[1];

            var usedRange = worksheet.UsedRange;

            usedRange.RemoveDuplicates(1);

            workbook.Close(true);

            Marshal.ReleaseComObject(excel);


I was hoping that the spire.xls code would be faster than Interop but that didn't seem to be the case here..

Anyways, thank you for your reply.

Jayanta.KMC
 
Posts: 21
Joined: Sat Mar 06, 2021 2:42 pm

Tue May 25, 2021 7:02 am

Hello,

Thank you for your feedback.
Our Spire.XLS is different from Interop, they have different internal algorithm, hence the existence of difference on time is inevitable. Could you please provide your Excel document and tell us the time it cost on your side when testing my code, then I will investigate whether it can be optimized further.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Wed May 26, 2021 2:11 am

Hello,

Please find attached the sample file for your testing.

I've found the Free Spire.XLS code approximately 11 to 12 times slower than Microsoft Excel Interop method on the attached file for duplicate row removal.

Please have a look and see it can be optimized because the difference is quite substantial.

Thank you.
Attachments
Test.zip
(79.89 KiB) Downloaded 288 times

Jayanta.KMC
 
Posts: 21
Joined: Sat Mar 06, 2021 2:42 pm

Wed May 26, 2021 9:43 am

Hello,

Thank you for your sharing.
I used the latest commercial version of Spire.XLS (Spire.XLS Pack (Hotfix) Version: 11.4.6) to test the excel file you provided, and it only took about 1.3 seconds. Using the free version (Free Spire.XLS for .NET Version: 10.10) took about 4.3 seconds. The latest commercial version is almost as efficient as Interop. It has a better performance than the free version. Please use the latest commercial version to have a test.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Tue Jun 08, 2021 7:04 am

Hello,

Greetings from e-iceblue.
How is your issue going? Are you using the latest commercial version for testing? Can you give us some feedback at your convenience? Thanks in advance.

Sincerely,
Annika
E-iceblue support team
User avatar

Annika.Zhou
 
Posts: 1643
Joined: Wed Apr 07, 2021 2:50 am

Thu Aug 12, 2021 4:56 pm

Hi ,

I am trying to use below code for removing Duplicates from .CSV file in Excel 365. As its parsing all rows and columns, its talking ages approx 40 mins to remove dups. I have not seen working, but I had to stop code execution as its too much slow. is there anything I am doing wrong ? All I am doing is changing input.xlsx to input.csv.

Do you think code should work for csv as well?

I see its trying to parse all rows ( 1048556) and cols (163822) irrespective of data available or not.

Is thr any way to work with just used row/cols for csv.

KhushiMehra
 
Posts: 1
Joined: Thu Aug 12, 2021 4:52 pm

Return to Spire.XLS