in reply to Indexing of Word documents

As you are indexing the data from word, the exact page number does not matter given the reasons stated by davies. A better solution is to index the document and use a paragraph counter to index any key words from. The number of paragraphs remain the same regardless of how the document re-flows. Only an edit to the document can change the paragraph count.

Replies are listed 'Best First'.
Re^2: Indexing of Word documents
by axiomcrs (Initiate) on Jun 10, 2013 at 19:10 UTC
    Thanks for all suggestions. Here are some more details. This script is to create an index for a book. The word files will only reside on one computer, and so, the issues with changing computers and different printers goes away. Using paragraphs does not work since any paragraph could be on 2 pages at once and then a page number associated with a name would be wrong. I am not forced to do this with Word. So changing to pdf could be an option since an index for a book can be provided with a pdf file. I was asking about using pdf, but could not determine if page numbers are associated with the text. For instance, if I search for bob jones in the pdf file, is there meta-data that tells what page number that name appears?
      hey flexvault, does the pdftohtml program give page numbers as a metadata for the text?