in reply to link parsing

I strongly recommend implementing this using URI!

Something in the line of (untested and incomplete!):

foreach my $url (@search) { my $uri = URI->new($url); if ($uri->scheme() =~ /http/ || !defined($uri->scheme()) { #process $uri } }

regards,
tomte


Hlade's Law:

If you have a difficult task, give it to a lazy person --
they will find an easier way to do it.

Replies are listed 'Best First'.
Re: Re: link parsing
by coldfingertips (Pilgrim) on Mar 30, 2004 at 09:35 UTC
    use LWP::Simple qw(!head); use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; my $ua = LWP::UserAgent->new; my $p = HTML::LinkExtor->new; $ua->timeout(3); my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); ################## # Retrieve information from our anony array ################## for ($p->links) { if (defined $_->[2]) { push(@search, $_->[2]); } } ################# # Take known URL-types and rebuild them ################# foreach(@search) { if ($_ !~ /^http:\/\//gi) { if ($_ !~ /^#/g) { if ($_ !~ /mailto:/gi) { my $force_url = "$base$_"; push(@search_ready, "$force_url"); } } } else { if ($_ =~ /^\#/g) { my $force_url = join("", $url, $_); #print "$force_url<br>"; push(@search_ready, "$force_url"); } else { #print "$_<br>"; push(@search_ready, "$_"); } } }
    Is what I have so far, actually. I was using URI! :)

      You are useing URI::URL, but you're not actually using it anywhere in your code.


      He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.

      Chady | http://chady.net/