in reply to Extracting text from MS Word files on a Linux box

Have you tried strings? Always used to do the trick before the MS format changed.

  • Comment on Re: Extracting text from MS Word files on a Linux box

Replies are listed 'Best First'.
Re^2: Extracting text from MS Word files on a Linux box
by afoken (Chancellor) on Jun 21, 2018 at 20:18 UTC
    Have you tried strings? Always used to do the trick before the MS format changed.

    docx is just a bunch of zipped XML files and some misc files. strings will fail due to ZIP, but once unpacked, strings will happily dig through the XML files.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^2: Extracting text from MS Word files on a Linux box
by Laurent_R (Canon) on Jun 21, 2018 at 11:55 UTC
    I just did not think about it. That's a very good idea, I'll try it. I don't know how it works under the hood, but I know that the Linux grep command is able to find strings in a MS word file, so, if it works similarly, the Linux string command might be all I need.

    Thanks hippo.