Scraping Webpage

madM has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Scraping Webpage (javascript) by Anonymous Monk on Nov 19, 2013 at 02:52 UTC
Mechanize doesn't deal with javascript as documented Re: WWW Mechanize FAQ!!!, The State of Web spidering in Perl, Re: Seek Perl equivalent to DOM Inspector (js pipe dream), WWW::Mechanize::FAQ	[reply]
Re: Scraping Webpage by taint (Chaplain) on Nov 19, 2013 at 03:24 UTC
Greetings, madM. I think LWP is probably more suited for the type of work, you're looking for. :) Best wishes. --Chris #!/usr/bin/perl -Tw use Perl::Always or die; my $perl_version = (5.12.5); print $perl_version;	[reply]
Re^2: Scraping Webpage by Anonymous Monk on Nov 19, 2013 at 10:13 UTC
I think LWP is probably more suited for the type of work, you're looking for. :) WWW::Mechanize is built out of LWP, to save you work learning how to build WWW::Mechanize out of LWP, because 99/100 noobs who think (i need scrape this) don't know anything about HTTP, and can't wrap their minds around LWP (I need save request/response object? Whaaat?)	[reply]
Re^3: Scraping Webpage by taint (Chaplain) on Nov 19, 2013 at 17:02 UTC
Greetings. In my humble defense; I wrote an entire web page that would elicit HEAD, and every other request available in the HTTP 1.0 / 1.1 spec, including downloading the entire page. This includes sanitizing INPUT, creating the form fields, and adding graphics, and CSS. I completed the entire page in under 5 minutes, and I chose LWP, and only LWP. Why? Because inspite your assertion; WWW::Mechanize adds complexity, and overhead in this scenario. His request is a bone-headed/dead-simple request, that was exactly what LWP was made for. In fact, to complete OP's request, would have only required one additional Module; HTML::Restrict, and there are others. The Module I listed will STRIP the HTML tags of choice. Leaving the OP with an easily controlled/formatted document to display, at the OP's wishes. I hope this provides some insight for the OP. --Chris #!/usr/bin/perl -Tw use Perl::Always or die; my $perl_version = (5.12.5); print $perl_version;	[reply]
Re^4: Scraping Webpage by Anonymous Monk on Nov 19, 2013 at 22:10 UTC
Re^5: Scraping Webpage by taint (Chaplain) on Nov 19, 2013 at 22:34 UTC
Some notes below your chosen depth have not been shown here
Re: Scraping Webpage by Discipulus (Canon) on Nov 19, 2013 at 08:21 UTC
as taint said is a LWP task and many times an LWP::Simple one. Many times this tasks can be hacked in quasi-oneliner,as in: `perl -e "use LWP::Simple; my $word = $ARGV[0];map { print qq($_ ) if / +<h2.$word<\/h2>/../Link/ } split /\n/, get('http://dictionary.refer +ence.com/browse/'.$word);"` [download] The final get (imported by LWP::Simple) get the doc that is splitted and, if flip-flop is on, map do the print. L There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re^2: Scraping Webpage by Anonymous Monk on Nov 19, 2013 at 10:15 UTC
Have you visted the OPs url? Its full of javascript; WWW::Mechanize is built out of LWP , neither can help with javascript	[reply]