artche has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

i'm creating a web crawler and using www::mechanize::link to extract links from content.

Everything almost works fine, but i can't get attributes of link.

Fragment of code is
use WWW::Mechanize; use Data::Dumper; use DBI; ... my $mech = WWW::Mechanize->new( stack_depth => 0, autocheck => 0, onerror => undef, ); $mech->timeout(30); $mech->agent_alias( 'Windows IE 6' ); $mech->max_redirect( 0 ); $mech->get("http://".$placement_page_url); ... my @links = $mech->links(); foreach $link (@links) { ... print $link->attrs." attrs\n"; %dump = Dumper $link->attrs; print %dump; ... }

I can parse almost every infromation about links. I get url_abs, base, text, tag etc. But for attributes there is a hash ref. I can display it trough Data::Dumper but how to extract values from it? I need attributes like "rel" and "title". How to extract it? Simply hash ref, key->value doesn't work.

Monks, i'm hoping, that someone knows how read from attributes and can help me.

artche

Replies are listed 'Best First'.
Re: Getting link attributes from WWW::Mechanize?
by Your Mother (Archbishop) on Dec 16, 2010 at 23:42 UTC

    It's just a hash ref so dereference it with the key you want. E.g.,

    use warnings; use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new( stack_depth => 0, autocheck => 0, onerror => undef, ); $mech->agent_alias("Windows IE 6"); $mech->get("http://cnn.com"); for my $link ( $mech->links ) { print " URI: ", $link->url_abs, $/; print "Title: ", $link->attrs->{title} || "[n/a]", $/, $/; } __END__ URI: http://www.cnn.com/ Title: [n/a] URI: http://edition.cnn.com/ Title: CNN INTERNATIONAL URI: http://www.cnnmexico.com/ Title: CNN M?XICO URI: javascript:cnn_initeditionhtml(3); Title: [n/a]

    As you can see from México, there may be encoding issues you'll need to address.