in reply to Re^2: Browser automation to copy webpage to text
in thread Browser automation to copy webpage to text

For a Perl solution, you can try PDF::FromHTML — if you can get it to install. :-(

For automated, non-Perl solutions, you can look at something like HTMLDOC (free, but you have to build it from source), or Doxillion Document Converter (not free).

But you’ll probably get the best results by manually saving (or “printing”) the page to PDF format in your browser. For example, in Google Chrome select Print..., then under Destination click the Change button and select Save as PDF. In Firefox, install the “Save as PDF” add-on which places a Save as PDF by pdfcrown.com button on the address bar.

You may be able to automate this browser-based approach from Perl via a module such as WWW::Mechanize::Firefox; but that’s way outside my experience.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

  • Comment on Re^3: Browser automation to copy webpage to text