Re: how do i split a link

You could combine HTML::LinkExtor and URI:

use HTML::LinkExtor;
use URI;

my @links = ();
my $html = do { local $/; <DATA> };

sub extract_links {
    my ($tag, %attr) = @_;
    next unless $tag eq 'a';
    my @parts = split /\./, URI->new($attr{href})->host;
    my $host = join '.', @parts[-2, -1];
    push @links, $host;
}

my $p = HTML::LinkExtor->new(\&extract_links);
$p->parse($html);

print join "\n", @links;

__DATA__
<a href="http://www.foo.com">description</a>
<a href='http://www.foo.com'>image here</a>
[download]

Of course you might want to add some error checking...

gav^

Comment on Re: how do i split a link Download Code

Replies are listed 'Best First'.

(crazyinsomniac) Re^2: how do i split a link
by crazyinsomniac (Prior) on Apr 14, 2002 at 10:16 UTC

use HTML::LinkExtor;
my @links = ();
my $html = join'',<DATA>; # much more elegant than => do { local $/; <
+DATA> };

sub extract_links {
    my ($tag,undef,$url) = @_;

    if($tag eq 'a') {
        push @links, $url->host;
    }
}

my $p = HTML::LinkExtor->new(\&extract_links,'http://foobar.com');
$p->parse($html);

print join "\n", @links;

__DATA__
<a href="http://www.foo.com">description</a>
<a href='http://www.foo.com'>image here</a>
<A href='http://foo-bar-publishers.co.uk'>image here</a>
[download]

update: no need for a patch, it's in there (at least in $VERSION = sprintf("%d.%02d", q$Revision: 1.31 $ =~ /(\d+)\.(\d+)/);).

______crazyinsomniac_____________________________
Of all the things I've lost, I miss my mind the most.
perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"