Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to find a GOOD regex to pull back links from a given $url. I know, I know, don't reinvent the wheel and HTML regexes aren't practical.. and I know some will suggest some modules that'll help, too. I'd prefer a fairly decent regex if anyone has one.

Anyway, I found this code from someone else's snippet but oddly enough, when I run it I don't get back any links. Did I forget to copy a line of code or something?

#!/usr/bin/perl use warnings; use strict; use CGI qw/:standard/; use CGI::Carp qw(fatalsToBrowser); use LWP::UserAgent; use LWP::Simple qw(!head); use HTML::LinkExtor; use URI::URL; my $url = "http://www.test.com"; my $url = shift; my @links; my $ua = LWP::UserAgent->new; my $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); for ($p->links) { if (defined $_->[2]) { push(@links, $_->[2]); } } foreach my $link (@links) { print "$link<br>"; }

Replies are listed 'Best First'.
Re: Can't get back links from page
by shmem (Chancellor) on Apr 06, 2007 at 09:38 UTC
    when I run it I don't get back any links.

    But you get back some errors and warnings; you should fix your code.

    Did I forget to copy a line of code or something?
    Probably, as e.g.
    my $p = HTML::LinkExtor->new(\&callback);

    is passing a CODE reference of a non-existing subroutine into the method new of HTML::LinkExtor.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Can't get back links from page
by Samy_rio (Vicar) on Apr 06, 2007 at 04:16 UTC

    Hi, try this,

    use strict; use warnings; use WWW::Mechanize; use Data::Dumper; my $mech = WWW::Mechanize->new( autocheck => 1 ); my $url = 'http://www.perlmonks.com/?node_id=608600'; $mech->get( $url ); my @links = $mech->links(); print Dumper @links;

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';