in reply to Re^3: Perl variant of linux tool strings
in thread Perl variant of linux tool strings

OP, you may have some luck loading MS Word into (star|open)office, printing to pdf then chucking it at ps2ascii. As it is the exact same formating that is hardest for *office to get correct and ascii has little remmenant of these I guess you could have a lot of luck.

update

As ambrus points out below of course if you can read the word doc into *office then you can just export ASCII from there. Sorry, it has been a rather long day

You may also want to trawl through a list of filters, I found this one which looks like it may have some tools that could help

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!
  • Comment on Re^4 perl variant of linux tool 'strings'

Replies are listed 'Best First'.
Re^5: Perl variant of linux tool strings
by ambrus (Abbot) on Mar 23, 2005 at 21:30 UTC

    If you can load a document to *office, why don't you save it straight as an ascii text or at least any other format that can be parsed easily?

Re^5: Perl variant of linux tool strings
by Joost (Canon) on Mar 23, 2005 at 21:51 UTC
      Thanks a lot Monks!!
      Just to make clear why I need al this: I wrote this cgi-script that manages a database.
      It allows one to add information into it and retrieve it back (via a search form). Adding useful information goes via a form too and might contain HTML just like this one. So when you post something like:

      a href = my_word document.doc ... etc

      With a post containing a word document I like to be able to read this document and be able to create the search-keywords for that post
      But my impressions is, is that this is not so easy!!
      Luca