in reply to Re^4 perl variant of linux tool 'strings'
in thread Perl variant of linux tool strings

PDF and postscript are probably not the right format for getting "rich text" information, since they tend to split up text to accomodate for formatting of images etc.

Getting Word or whatever to write out RTF (or HTML, though I don't recommend it, you would need at least HTMLTidy to clean it up: use the word 2000 switch) might be easier.

  • Comment on Re^5: Perl variant of linux tool strings

Replies are listed 'Best First'.
Re^6: Perl variant of linux tool strings
by jeanluca (Deacon) on Mar 24, 2005 at 07:23 UTC
    Thanks a lot Monks!!
    Just to make clear why I need al this: I wrote this cgi-script that manages a database.
    It allows one to add information into it and retrieve it back (via a search form). Adding useful information goes via a form too and might contain HTML just like this one. So when you post something like:

    a href = my_word document.doc ... etc

    With a post containing a word document I like to be able to read this document and be able to create the search-keywords for that post
    But my impressions is, is that this is not so easy!!
    Luca