in reply to Stripping Meta/Control Key, etc

What sort of documents? Word, Wordpad, Excel, Acrobat pdf's and so on all have their own document format and just running through stripping out characters that you don't understand won't get you the document text. Or at least, it probably will, but will also get you a pile of other stuff that may look like text but is junk. Perhaps repeated parts of the same document, perhaps stuff that happened to be in memory when the document was created, perhaps symbol tables from the application, but in any case junk, and piles of it.

So, again, what sort of documents and what do you expect to be able to get out of them?


DWIM is Perl's answer to Gödel