Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

I can hardly describe what I'm trying to do here without good results to look at, but I think the partial results I have now can be coaxed into something useful.

## get content off web my $start = "http://en.censor.net.ua/"; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( $start ); my @links = $mech->find_all_links(); for my $link ( @links ) { my $text; my $link = WWW::Mechanize::Link->new( { text => $text, } ); say "text is $link->text()"; }

Instead of text, I've got a bunch of output that shows I'm not dereferencing this right:

text is WWW::Mechanize::Link=ARRAY(0x22ca880)->text() text is WWW::Mechanize::Link=ARRAY(0x22ca8b0)->text() text is WWW::Mechanize::Link=ARRAY(0x22ca820)->text() text is WWW::Mechanize::Link=ARRAY(0x22ca868)->text() text is WWW::Mechanize::Link=ARRAY(0x22ca898)->text() text is WWW::Mechanize::Link=ARRAY(0x22ca808)->text() text is WWW::Mechanize::Link=ARRAY(0x22ca850)->text()

Without text to feed to regex, I'm stuck for navigation and would appreciate and tips you have. Thanks in advance.

Replies are listed 'Best First'.
Re: Using WWW::Mechanize effectively
by Your Mother (Archbishop) on Aug 23, 2014 at 00:26 UTC

    Here you go–

    use strictures; use WWW::Mechanize; use Encode; my $mech = WWW::Mechanize->new( autocheck => undef ); my $start = shift || die "Give a URL!\n"; $mech->get($start); $mech->success or die "Sorry, sucker!\n", $mech->response->as_string; for my $link ( $mech->find_all_links ) { printf "Link\n * text -> %s\n * URI -> %s\n", encode("UTF-8", $link->text) || "na", $link->url_abs; }

    Excerpts from usage

    perl ~/pm-1098382 http://yahoo.co.jp
    Link
     * text -> na
     * URI -> http://bb.yahoo.co.jp/
    Link
     * text -> 投資家情報
     * URI -> http://www.yahoo.co.jp/r/fiv
    
    perl ~/pm-1098382 http://en.censor.net.ua
    Link
     * text -> na
     * URI -> http://en.censor.net.ua/favicon.ico
    Link
     * text -> "Censor.NET"
     * URI -> http://en.censor.net.ua/
    Link
     * text -> Яндекс цитирования
     * URI -> http://yandex.ru/cy?base=0&host=censor.net.ua
    

    You already have WWW::Mechanize::Link objects so you are just kind of mangling them into something odd by attempting to create them. You can see from the output that you will have to filter out JS, <links/> and the like. :P