Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to extract all links from web page.So I Use Mechanize Module for extracting link.I extract all links if Source is in tag for example

<a href=...> <area href=...> <frame src=...> <iframe src=...> <link href=...> <meta content=...>

Because In Mechanize module they defined these tag.so I extract all the links.But I could not extract one link from source .Source is like "location='http://www.loopnet.com/looplink/lund/qryradio.aspx".For that no definition in that module.i want to extract that link

This is my code

use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); $mech->get("any url"); my @links = $mech->links(); for my $link (@links) { printf "%s\n",$link->url; }

Help me in this problem to solve

Regards,

Senthil

Replies are listed 'Best First'.
Re: Link Extracter using Mechanizemodule
by Corion (Patriarch) on Aug 26, 2011 at 07:10 UTC

    HTML has no location attribute. It is quite unclear to me what kind of tag you are trying to extract. Please show a short (5 lines) HTML example (in <code>...</code> tags) that shows the tag from which you want to extract the link.

        Then you will need to inspect what ->content gives you and use regular expressions. The link methods of WWW::Mechanize do not deal with arbitrary (unlinked) URLs. They only extract certain tags.

        Take a look at HTML::Tree::Scanning. Using HTML::TreeBuilder and the look_down method mentioned in the article works well for me.