Spire.Doc is a professional Word .NET library specifically designed for developers to create, read, write, convert and print Word document files. Get free and professional technical support for Spire.Doc for .NET, Java, Android, C++, Python.

Mon Oct 14, 2024 4:56 pm

I'm a trial user of Spire modules.

In Spire.Doc, the code loops through tables of a doc section, how to get each table's title?

Also, is there documentation of Spire.Doc to tell us all the possible objects (or items) and their usage (like sample code of how to set/get them)?


TIA

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Tue Oct 15, 2024 6:29 am

Hello,

Thanks for your inquiry. You can refer to the following code to obtain the title of each table. Additionally, you can learn about Spire.Doc's api through this link(https://www.e-iceblue.com/misc/apireference.html). If you want to refer to examples, you can download 'Spire.Doc Pack' from this link(https://www.e-iceblue.com/Download/down ... t-now.html) or refer to this link(https://www.e-iceblue.com/Tutorials/Spi ... ntent.html). If you did not successfully obtain the title, please provide us with your input Word document for investigation. You can upload here or send it to us via email( [email protected] ). Thank you in advance.
Code: Select all
  Document document = new Document(@"TableSample.docx");
  foreach (Section section in document.Sections)
  {
      foreach (DocumentObject documentObject in section.Body.ChildObjects)
      {
          if (documentObject is Table)
          {
              Table table = (documentObject as Table);
              Console.WriteLine(table.Title);
          }
      }
  }

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 306
Joined: Mon Jul 15, 2024 5:40 am

Tue Oct 15, 2024 2:26 pm

Thank you for your response!

I forgot to mention that I'm using Python. When I tried the following test code, I got empty output -

Code: Select all
# Get a table
 table = section.Tables.get_Item(j)
 print (table.Title)



Did I do anything wrong?

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Wed Oct 16, 2024 1:42 am

Hello,

Thanks for your feedback. I tested it using the following code and everything worked fine. If you are not using the latest version(Spire.Doc for Python Version:12.7.1), please update and try again. If the issue persists after the update, please provide us with your input Word document for investigation. You can upload here or send it to us via email( [email protected] ). Thank you in advance.

Code: Select all
# Create a Document object
doc = Document() 
 
# Load input document 
doc.LoadFromFile("TableSample.docx") 

for s in range(doc.Sections.Count):
    # Get a section
    section = doc.Sections.get_Item(s)
    # Get tables from the section
    tables = section.Tables
    for i in range(0, tables.Count):
      print (tables.get_Item(i).Title)

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 306
Joined: Mon Jul 15, 2024 5:40 am

Mon Oct 28, 2024 7:51 pm

Thank you for your previous response, I got account (login) issue so my reply is delayed -

I'm uploading a testing doc file I randomly threw some contents and tables.

Here are my challenges using your modules -
    - I still can't get the table name(s) even with the sample code
    - Can the module process/converse the tables without using 'section' (i.e. to convert tables one by one)?
    - I would like to combine the tables that are broken into 3 parts (1/3, 2/3, and 3/3) in my original document because there are too many columns in pdf/word, that's one of the reasons we want to use xls
    - This test document also has scenario that a table is across two pages (in the word doc), and one table was across multiple pages in original <pdf> document (indicated by "Table xx.xx - continued from previous page"), I hope your modules can handle these without too much work

Thank you!

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Tue Oct 29, 2024 7:29 am

Hello,

Thanks for your feedback. Based on the Word document you provided, I have adjusted the following code for your reference on how to obtain the titles in the table. Additionally, based on the structural characteristics of Word documents, we can only retrieve tables from each section and perform corresponding processing. Finally, do you mean to merge the tables that were divided into three parts from the original document into one Excel document? If I misunderstood your requirements, please share more details for reference, thanks in advance.
Code: Select all
# Create a Document object
doc = Document()

# Load input document
doc.LoadFromFile("tableName.docx")

# Traverse every section in the document
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
    # Collection of Tables in the Section
    tables = section.Tables
    # Traverse the collection of tables
    for index in range(tables.Count):
        # Retrieve the index of the section where the table is located
        tableIndex = section.Body.ChildObjects.IndexOf(tables[index])
        # Retrieve the previous paragraph of the table
        documentObject = section.Body.ChildObjects.get_Item(tableIndex-1)
        if isinstance(documentObject, Paragraph): 
            paragraph = Paragraph(documentObject) 
            # print title
            print(paragraph.Text)

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 306
Joined: Mon Jul 15, 2024 5:40 am

Tue Oct 29, 2024 2:45 pm

Thank you Amin for your prompt response!

It's interesting to use 'paragraph' object to get the table title, but good to know this.

Yes, I would like to combine those adjacent tables with 1/3, 2/3, and 3/3.

I'm attaching a new sample doc (with a single table added), and a screenshot of the tables I would like to have in the output excel book. In this new doc, I'm expecting to get 8 tables (marked in the screenshot).

Can you provide code snip to do that?

Thank you!

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Wed Oct 30, 2024 9:42 am

Hello,

Thanks for your inquiry. You can refer to the following code to use Spire.doc andSpire.xls to write data from Word tables into Excel file. If you have any further questions, please feel free to provide feedback.
Code: Select all
doc = Document()

doc.LoadFromFile("tableName.docx")

#Create a workbook object
wb = Workbook()
#Clear all worksheets in the workbook     
wb.Worksheets.Clear()
#Create an empty worksheet in the workbook
worksheet = wb.CreateEmptySheet()

row = 1
column = 1


def ExportTableInExcel(worksheet, start_row, table): 
    row = start_row 
    for index in range(table.Rows.Count):
        column = 1 
        for k in range(table.Rows[index].Cells.Count):
            tb_cell=table.Rows[index].Cells[k]
            CopyContentInTable(tb_cell, worksheet.Range[row, column]);           
            column += 1 
        row += 1 
     
    return row 

def CopyContentInTable(tbCell, cell): 
    newPara = Paragraph(tbCell.Document)
  # Traverse sub objects of table cells
    for i in range(tbCell.Count): 
        # Retrieve the current sub object 
       documentObject = tbCell.ChildObjects[i]         
        # If the sub object is of paragraph type 
    if isinstance( documentObject, Paragraph): 
       paragraph = Paragraph(documentObject)
       
    for cObj in  range(paragraph.ChildObjects.Count): 
        # Clone and add sub objects to a new paragraph 
        newPara.ChildObjects.Add(paragraph.ChildObjects.get_Item(cObj).Clone())
             
        # If it's not the last child object, add a line break
        if i < tbCell.ChildObjects.Count - 1: 
         
          newPara.AppendText("\n")
    CopyText(cell, newPara)   

def CopyText(cell, paragraph): 
   
    richText = cell.RichText       
    richText.Text = paragraph.Text
                     
def getTitle(table):     
   
    tableIndex = section.Body.ChildObjects.IndexOf(table)
    documentObject = section.Body.ChildObjects.get_Item(tableIndex-1)
    if isinstance(documentObject, Paragraph): 
        paragraph = Paragraph(documentObject)   
        return paragraph.Text
#Determine whether to add a new sheet   
def groupTitle(tableName):
   if tableName.endswith("(1/3)"):
       return True
   else:
       return False
# Traverse every section in the document
for i in range(doc.Sections.Count):
    section = doc.Sections.get_Item(i)
   
    tables = section.Tables
 
    for index in range(tables.Count):
        # Get the table
        table = tables[index] if isinstance(tables[index], Table) else None
        title= getTitle(table)
        bool = groupTitle(title)
        if(bool==True):
          worksheet = wb.CreateEmptySheet()
          row = 1
          column = 1
        if(index>0):
          currentRow = ExportTableInExcel(worksheet,row, table)
        # Update row counter
          row = currentRow
     #Automatically adjust the width of all rows in the worksheet             
         worksheet.AllocatedRange.AutoFitRows()
    #Automatically adjust the width of all columns in the worksheet           
         worksheet.AllocatedRange.AutoFitColumns()
    #Set automatic text wrapping for all cells in the worksheet
         worksheet.AllocatedRange.IsWrapText = True       
wb.SaveToFile("Output/WordToExcel.xlsx", ExcelVersion.Version2013); 

Sincerely,
Amin
E-iceblue support team
Last edited by Amin.Gan on Thu Oct 31, 2024 3:26 am, edited 1 time in total.
User avatar

Amin.Gan
 
Posts: 306
Joined: Mon Jul 15, 2024 5:40 am

Wed Oct 30, 2024 3:01 pm

Thanks again Amin for your response!

The code works fine except -
1) We want the 3 tables (1/3, 2/3, and 3/3) to combine horizontally (instead of vertically), i.e. ADD COLUMNS (not ROWS) of the 3 tables together. In other words, if the 3 tables each has 3 columns, the combined table should end up with 9 columns (and rows remain the same as each of the original tables).
2) In the output combined tables, there are duplicated header rows (they are from 'continued' sections for the same table), it will be great if the duplicated ones can be removed. See the attached 'duplicated_header_rows'.
3) The first sheet ('Sheet4') actually contains 2 tables from the doc (the 1st table just has one row), I tried a little to separate them with the code, but didn't figure out. If you can fix it, that will be great. See the attached 'incorrect_tables'.


Thank you!

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Wed Oct 30, 2024 3:10 pm

To elaborate a little more -

To avoid duplicated header rows, the code should only pick the header row when the table title (or name) contains '1/3', '2/3', or '3/3' (the other way is to skip header row if the table title/name contains "continued from previous page").

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Wed Oct 30, 2024 3:28 pm

I just noted that one column ('Displayed Version') in the doc lost 'Displayed' in the name after exported to spreadsheet. See the attached screenshot.

Thanks!

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Fri Nov 01, 2024 1:59 am

Hello,

Thanks for your feedback. I have adjusted the code according to your requirements, please refer to the attachment. If you have any further questions, please feel free to provide feedback.

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 306
Joined: Mon Jul 15, 2024 5:40 am

Fri Nov 01, 2024 2:35 pm

Hi Amin,

Thank you for the updated code!

It's my fault that I didn't explain clearly - the broken tables in 1/3 (i.e. table title with 1/3 AND any tables after it with '-continued from previous page', before the table of 2/3) should be added vertically - to the rows, I'm attaching the screenshots showing the adjustment.

If you can help update the code, that will be great!


Thank you!

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Fri Nov 01, 2024 3:40 pm

Also, I tried to find more tech documentation of Spire products but no luck.

For example, your created empty sheet with the following code, but I would like to assign the original table name as the sheet name, but don't know how.

Code: Select all
worksheet = wb.CreateEmptySheet()

I hope it could be as easy as -

Code: Select all
worksheet = wb.CreateEmptySheet('sheet_name')


or

Code: Select all
worksheet = wb.CreateEmptySheet().Name('sheet_name')

but can't find any tech doc to refer.


Thank you!

pbsd92128
 
Posts: 11
Joined: Tue Oct 08, 2024 9:14 pm

Tue Nov 05, 2024 2:52 am

Hello,

Thanks for your inquiry. According to your requirements, I have attached the adjusted code for your reference. Additionally, you can refer to the following technical documents and feel free to provide feedback if you have any further questions.

Sincerely,
Amin
E-iceblue support team
User avatar

Amin.Gan
 
Posts: 306
Joined: Mon Jul 15, 2024 5:40 am

Return to Spire.Doc

cron