in reply to Re: Extracting text from MS Word files on a Linux box
in thread Extracting text from MS Word files on a Linux box
Have you tried strings? Always used to do the trick before the MS format changed.
docx is just a bunch of zipped XML files and some misc files. strings will fail due to ZIP, but once unpacked, strings will happily dig through the XML files.
Alexander
|
---|