And here is the long-winded road of using the mech to save to PDF and then use pdftotext (linux command line) to extract the text (all mixed up and good luck):
... my $pdf_data = $mech->content_as_pdf( format => 'A0' ); open(my $fh, '>:raw', 'the.pdf') or die $!; print $fh $pdf_data; close $fh; `pdftotext 'the.pdf'`;
Note that 'A0' paper size ...
In reply to Re^2: Module to extract text from HTML
by bliako
in thread Module to extract text from HTML
by Bod
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |