Re^3: Scraping Webpage

Greetings.

In my humble defense;
I wrote an entire web page that would elicit HEAD, and every other request available in the HTTP 1.0 / 1.1 spec, including downloading the entire page. This includes sanitizing INPUT, creating the form fields, and adding graphics, and CSS. I completed the entire page in under 5 minutes, and I chose LWP, and only LWP. Why? Because inspite your assertion; WWW::Mechanize adds complexity, and overhead in this scenario. His request is a bone-headed/dead-simple request, that was exactly what LWP was made for.

In fact, to complete OP's request, would have only required one additional Module; HTML::Restrict, and there are others. The Module I listed will STRIP the HTML tags of choice. Leaving the OP with an easily controlled/formatted document to display, at the OP's wishes.

I hope this provides some insight for the OP.

--Chris

#!/usr/bin/perl -Tw
use Perl::Always or die;
my $perl_version = (5.12.5);
print $perl_version;

Comment on Re^3: Scraping Webpage

Replies are listed 'Best First'.
Re^4: Scraping Webpage by Anonymous Monk on Nov 19, 2013 at 22:10 UTC
In my humble defense; ... Nonsense	[reply]
Re^5: Scraping Webpage by taint (Chaplain) on Nov 19, 2013 at 22:34 UTC
I suppose you can back that up with an example? Or is this more of a Troll sort of reply. In other words; I can prove it. Can you? `#!/usr/bin/perl use strict; use LWP::Simple; use LWP::UserAgent; my $ua = LWP::UserAgent->new(); $ua->agent("MS-DOS 3.20"); my @array = head(shift); my $i = 0; foreach my $line (@array) { print "$i: $line\n"; $i++; }` [download] If you need help creating a web (html) form to call this from. I can provide that too. --Chris #!/usr/bin/perl -Tw use Perl::Always or die; my $perl_version = (5.12.5); print $perl_version;	[reply] [d/l]
Re^6: Scraping Webpage by Your Mother (Archbishop) on Nov 20, 2013 at 02:31 UTC
Why are you creating a UA object? You don't use it. Anonymonk hit the mark initially. The site is JS driven. LWP is completely useless here. I'm surprised you got any upvotes for suggesting it. WWW::Mechanize at least has a JS flavor. Suggestions from its FAQ: In no particular order: Gtk2::WebKit::Mechanize, Win32::IE::Mechanize, WWW::Mechanize::Firefox, WWW::Scripter, WWW::Selenium.	[reply]
Re^7: Scraping Webpage by taint (Chaplain) on Nov 20, 2013 at 04:02 UTC