Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: web page source?

by Falkkin (Chaplain)
on Feb 22, 2001 at 06:48 UTC ( [id://60119]=note: print w/replies, xml ) Need Help??


in reply to web page source?

To get the source, I'd get LWP::Simple from CPAN. The code to get your source would then be a simple 2-liner:
use LWP::Simple; my $source = get("http://whatever.url.you/want/to/view.html");
You only need the "use" directive once in your program; use the get() command every time you need to get the source of a page.

Writing an HTML parser by hand is very non-trivial... I'd look at HTML::Parser (again, at CPAN) and see if that'll make your life easier. I've not really used HTML::Parser before, but, by looking at the documentation and playing around for the last 15 minutes, it appears you'd want to do something like the following:

#!/usr/bin/perl -w use strict; use LWP::Simple; use HTML::Parser; my $source = get("http://www.perlmonks.org"); my $parser = HTML::Parser->new(); $parser->handler( start => \&function, 'token0, attr'); $parser->parse($source); sub function { my ($tag_name, $attr_ref) = @_; if ($tag_name eq 'a') { my %attr = %$attr_ref; print $attr{href}, "\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://60119]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 20:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found