cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Does mod WWW::Mechanize have a limit on the number of links it will retrieve from a page? If I point it at a page with a lot of links it seems that I only get the first 99.

TIA...Steve

2005-12-17 Retitled by jdporter, as per Monastery guidelines
Original title: 'WWW::Mechnize link limit?'

Replies are listed 'Best First'.
Re: WWW::Mechanize link limit?
by halley (Prior) on Dec 18, 2005 at 04:28 UTC
    I'm not aware of any WWW::Mechanize limitation but I could be wrong. Most web servers will either choke inadvertently or throttle the concurrent connections to avoid going down. You're probably just hitting that limit in your tests.

    To deal with this phenomenon, and to be a good network citizen, you should throttle your own requests to something reasonable, such as no more than three pages per second, or more kindly, one page per three seconds. Just call sleep(1) once in a while.

    --
    [ e d @ h a l l e y . c c ]

Re: WWW::Mechanize link limit?
by johnnywang (Priest) on Dec 18, 2005 at 08:00 UTC
    shouldn't be any size limit, here's the method in WWW::Mechanize that parses the links:
    sub _extract_links { require WWW::Mechanize::Link; my $self = shift; my $p = HTML::TokeParser->new(\$self->{content}); $self->{links} = []; while (my $token = $p->get_tag( keys %urltags )) { my $tag = $token->[0]; my $url = $token->[1]{$urltags{$tag}}; my $text; my $name; if ( $tag eq "a" ) { $text = $p->get_trimmed_text("/$tag"); $text = "" unless defined $text; my $onClick = $token->[1]{onclick}; if ( $onClick && ($onClick =~ /^window\.open\(\s*'([^']+)'/) ) + { $url = $1; } } if ( $tag ne "area" ) { $name = $token->[1]{name}; } next unless defined $url; # probably just a name link or <AR +EA NOHREF...> push( @{$self->{links}}, WWW::Mechanize::Link->new( $url, $tex +t, $name, $tag, $self->base ) ); } # Old extract_links() returned a value. Carp if someone expects # this version to return something. if ( defined wantarray ) { my $func = (caller(0))[3]; $self->warn( "$func does not return a useful value" ); } return; }
Re: WWW::Mechanize link limit?
by Cody Pendant (Prior) on Dec 18, 2005 at 05:15 UTC
    Any page with a lot of links on it, or one page in particular?


    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
Re: WWW::Mechanize link limit?
by Popcorn Dave (Abbot) on Dec 18, 2005 at 21:14 UTC
    Halley is correct. You need to put a sleep(1) or some other length of time in your code.

    I ran in to the exact same problem but I got stung at 5 links. However putting in a sleep(5) did the trick for my particular case and I was off and running.

    Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.
Re: WWW::Mechanize link limit?
by Anonymous Monk on Dec 18, 2005 at 04:44 UTC
    There should be no arbitrary limit. If you believe you've found one, and can reproduce it, please submit it as a bug to the email address listed in the Mech docs. Thanks.