j3f has asked for the wisdom of the Perl Monks concerning the following question:

Hello perlmonks, I'm trying to extract all the links from a webpage and I'm getting stuck in a never ending loop.How do I stop this?
use HTML::Parse; use HTML::Element; use LWP::Simple; use HTML::FormatText; $page = get('http://bleep.com/index.php'); $parsed_html = HTML::Parse::parse_html($page); $link_ref = $parsed_html->extract_links(); @link = @$link_ref; for ($i=0; $i <= @link; $i++) { $links = $link[$i][0]; } print $links;

Replies are listed 'Best First'.
Re: How do I stop this from being a never ending loop
by ikegami (Patriarch) on Jan 17, 2009 at 00:48 UTC

    You're continually expanding the size of @link (by autovivifying the element after the last). @link returns the size of the array, which is one higher than the last element of the array.

    for ($i=0; $i <= @link; $i++)
    should be
    for (my $i=0; $i < @link; $i++)
    or
    for (my $i=0; $i <= $#link; $i++)

    Even simpler would be
    for my $i (0..$#link)
    or even
    for my $link (@link)

    It's curious that @link (which contains multiple links) is singular and $links (a single link) is plural.

Re: How do I stop this from being a never ending loop
by jeffa (Bishop) on Jan 17, 2009 at 01:40 UTC

    You don't even need to bother with all of that code, if you just use the right module

    use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new; $mech->get( 'http://bleep.com/index.php' ); printf "%s\n", $_->URI for $mech->links;

    See also: WWW::Mechanize::Link

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Thanks alot for the help going to definitely look into the WWW::Mechanize module, it's looks like that's what I need to be using.
Re: How do I stop this from being a never ending loop
by johngg (Canon) on Jan 17, 2009 at 00:55 UTC

    @link in scalar context will give the number of elements in the array, not the index of the last element so you will try to access an element off the end of the array (whether this is the problem I'm not sure). You need to use $#link instead. So you could do

    for( $i = 0; $i <= $#link; $i++ ) { ...

    A more Perlish way would be

    for my $i ( 0 .. $#link ) { ...

    I'm not sure why you de-reference $link_ref and assign it to @link. You could do

    for my $i ( 0 .. $#{ $link_ref } ) { my $links = $link_ref->[ $i ]->[ 0 ]; }

    I hope this is of interest.

    Cheers,

    JohnGG