in reply to PDF, DOC, etc to HTML or directly to text?
Um, the HTTP response are the contents of the file. If the file is PDF/DOC/Image... there is no simple text, so yes, you need modules/programs to convert each to text.
If I do need, then which modules?
CPAN is full of candidates you'll have to sort through :) To convert images to text you need to use OCR software.... its probably easier to simply leverage google APIs or (google desktop?...)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: PDF, DOC, etc to HTML or directly to text?
by Anonymous Monk on Jan 01, 2010 at 22:46 UTC | |
by vit (Friar) on Jan 02, 2010 at 17:36 UTC |