Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm using WWW::Mechanize and I'm trying to figure out how to get all the outbound links in a page. Getting all links within the same domain is easy enough:
use strict; use Data::Dumper; use WWW::Mechanize; my $mech = new WWW::Mechanize; $mech->get('http://example.com/'); my $hostname = quotemeta( $mech->uri()->host ); my @inbound_links = $mech->find_all_links( url_abs_regex => qr!^https?://$hostname/! ); print Dumper(\@inbound_links);
However, I'm having a brain fart and cannot figure out how to do the inverse, getting all links NOT in the current domain?

Replies are listed 'Best First'.
Re: Get all outbound links with WWW::Mechanize?
by Corion (Patriarch) on Mar 23, 2011 at 15:43 UTC

    The easy way is to just pull all links and apply your regular expression outside:

    ... my @outbound_links = grep { $_->url !~ qr!^https?://$hostname/! } $mec +h->find_all_links; ...

    The harder way is to make your regular expression look ahead and ensure that it doesn't match:

    my @outbound_links = $mech->find_all_links( url_abs_regex => qr<^(?!https?://\Q$hostname\E/)>, );
Re: Get all outbound links with WWW::Mechanize?
by educated_foo (Vicar) on Mar 23, 2011 at 15:43 UTC
    Try url_abs_regex => qr!^https?://(?!$hostname)[^/]*/!