in reply to Removing Junk from Files

Well, for your WinWord .doc files, try RTF::Parser. Some of your other file formats may well have parser modules available.

For quick and dirty, sometimes I'll save a file in html via the app and then html2txt it for the output. Lazy? Yes! But, I did automate the process via Win32::OLE.

Anywho, I'm changing employers today (hurrah!), so I can't provide my script just now.

HTH
--
idnopheq
Apply yourself to new problems without preparation, develop confidence in your ability to to meet situations as they arrise.

UPDATE: Check out iguane's WORD TO TEXT SIMPLY for the OLE stuff I mentioned.