I scan a directory tree of word files for creation of an index with SWISH-E.
I use Win32::OLE and have M$-Word installed, but I do not need any interaction and Word does not have to be visible. This is bound to windows. Are you dependend on the windows platform? All tools I found were not exactly what I need, many come from the unix world and depend on these handy gnu libraries, but I am on Windows here.
Be aware, that after extracting the text you might still have lots of control characters forming tables etc. I am not interested in bold or italic text, but extract title and other document properties, user-defined properties and text.
My solution was straight forward as with every OLE interaction I have written in Perl. You open the application and the macro editor, press F1 to find the functions, record macros, save the VB-Script and translate and extend it to Perl, cutting its length to the half.
If somebody is interested, I could post my code as a starting point.
regards Brutha
And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
(Terry Pratchett, Small Gods)
In reply to Re: Need help with perl only parsing of M$ word file
by Brutha
in thread Need help with perl only parsing of M$ word file
by dwhitney
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |