I have about 1000 Word documents that are semi-actively modified using Word 2003. They are all text-based, i.e. no pictures, graphs or the like, and contain among other pieces of data, names of people. I need to find the names and then the page number that name is on and then create an index appended to the end of the concatenation of all the doc files. I have written a perl script using win32:ole and have experienced some success, but it seems that win32:ole is poorly documented and somewhat flaky. The script works more or less. but I can't seem to fix the last remaining bugs. They are related to saving the file and opening and closing the documents I believe. I was wondering if there is a better way of doing this? I would have preferred to keep the finished document as a word document, but perhaps this is problematic. Would it be better to extract the text from the word docs and convert it to a pdf file. I can't determine if it is possible to find some text in a pdf file and get the page number the text is on using these: pdf::api2, cam::pdf, pdf::core.
Also, I was wondering plain text may be a beter choice?
I can provide the perl script I have using win32::ole if it is of any help. Thanks.