in reply to PDF, DOC, etc to HTML or directly to text?

Do I need to convert PDFs, DOCs, etc first to HTML or there is a module (or modules) which can get the text from the http response?

Um, the HTTP response are the contents of the file. If the file is PDF/DOC/Image... there is no simple text, so yes, you need modules/programs to convert each to text.

If I do need, then which modules?

CPAN is full of candidates you'll have to sort through :) To convert images to text you need to use OCR software.... its probably easier to simply leverage google APIs or (google desktop?...)

  • Comment on Re: PDF, DOC, etc to HTML or directly to text?

Replies are listed 'Best First'.
Re^2: PDF, DOC, etc to HTML or directly to text?
by Anonymous Monk on Jan 01, 2010 at 22:46 UTC
      Thanks!! SWISH::Filter looks good, but how should I process a URL?