in reply to Link Extraction when grabbing web page with USER/PASS

Why bother with LinkExtor when you can just:

use HTML::TokeParser; my $parser = HTML::TokeParser->new( \$content ); my @links; while ( my $token = $parser->get_tag(qw( a img )) ) { my $link = $token->[1]{href} || $token->[1]{src} || next; push @links, $link; }

You will need to convert relative links to abolute if that is what you need. See Link Checker for more code.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: Link Extraction when grabbing web page with USER/PASS
by cdherold (Monk) on Mar 04, 2003 at 05:54 UTC
    ok, so you could use either of those, but the problem is why can't i get anything out with either one? is it because my web page is grabbed as a string? if so how do i change that so that i can extract links?

      Eh? Get page as string, stick in $content.

      my $content = <<HTML; <a href="http://what.the.com">hello?</a> <a href="http://is.dis.org">hello?</a> <a href="http://your.net">hello?</a> <a href="http://problem">hello?</a> HTML use HTML::TokeParser; my $parser = HTML::TokeParser->new( \$content ); my @links; while ( my $token = $parser->get_tag(qw( a img )) ) { my $link = $token->[1]{href} || $token->[1]{src} || next; push @links, $link; } print "@links";

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print