in reply to Re: Browser automation to copy webpage to text
in thread Browser automation to copy webpage to text

thank you so much! this is very helpful

would there be a way for me to save the entire webpage as a pdf instead of an rtf?

I realize even with rtf, some formats are broken

Ideally, I would like the webpage to be saved in pdf and then copied to microsoft word

Again, thank you so much

  • Comment on Re^2: Browser automation to copy webpage to text

Replies are listed 'Best First'.
Re^3: Browser automation to copy webpage to text
by Athanasius (Archbishop) on Oct 21, 2015 at 07:57 UTC

    For a Perl solution, you can try PDF::FromHTML — if you can get it to install. :-(

    For automated, non-Perl solutions, you can look at something like HTMLDOC (free, but you have to build it from source), or Doxillion Document Converter (not free).

    But you’ll probably get the best results by manually saving (or “printing”) the page to PDF format in your browser. For example, in Google Chrome select Print..., then under Destination click the Change button and select Save as PDF. In Firefox, install the “Save as PDF” add-on which places a Save as PDF by pdfcrown.com button on the address bar.

    You may be able to automate this browser-based approach from Perl via a module such as WWW::Mechanize::Firefox; but that’s way outside my experience.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,