| [reply] |
interesting point about docx...I don't think I've tried that...the procedure dies when I try to open password-protected files...that is not a problem though; I won't have them in production.
| [reply] |
Just a small note: docx, like all those other legacy+X-extensions from the newer MS Office versions, is a ZIP file containing XML and some helper files. Perl can unzip, perl can do really weired things with XML, so docx and friends are easier to handle than the classic binary garbage formats.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] |
I imagine you'd face the following problems:
- Character encoding issues
- Document encoding issues
- False positives from non-text components of the document (e.g. If I search for the word "bold", will I find matches that aren't in the document?)
- False negatives from text interrupted by formating codes (e.g. Can you match a sentence containing a bolded word?)
| [reply] |