in reply to how do i split a link

You could combine HTML::LinkExtor and URI:
use HTML::LinkExtor; use URI; my @links = (); my $html = do { local $/; <DATA> }; sub extract_links { my ($tag, %attr) = @_; next unless $tag eq 'a'; my @parts = split /\./, URI->new($attr{href})->host; my $host = join '.', @parts[-2, -1]; push @links, $host; } my $p = HTML::LinkExtor->new(\&extract_links); $p->parse($html); print join "\n", @links; __DATA__ <a href="http://www.foo.com">description</a> <a href='http://www.foo.com'>image here</a>
Of course you might want to add some error checking...

gav^

Replies are listed 'Best First'.
(crazyinsomniac) Re^2: how do i split a link
by crazyinsomniac (Prior) on Apr 14, 2002 at 10:16 UTC
    You might find reading the module, as well as its documentation, saves typing ;)
    use HTML::LinkExtor; my @links = (); my $html = join'',<DATA>; # much more elegant than => do { local $/; < +DATA> }; sub extract_links { my ($tag,undef,$url) = @_; if($tag eq 'a') { push @links, $url->host; } } my $p = HTML::LinkExtor->new(\&extract_links,'http://foobar.com'); $p->parse($html); print join "\n", @links; __DATA__ <a href="http://www.foo.com">description</a> <a href='http://www.foo.com'>image here</a> <A href='http://foo-bar-publishers.co.uk'>image here</a>
    Also, this "foo.com" request is rather silly, considering all the weirdo naming conventions out there (city.county.state.us ...)

    update: no need for a patch, it's in there (at least in $VERSION = sprintf("%d.%02d", q$Revision: 1.31 $ =~ /(\d+)\.(\d+)/);).

     
    ______crazyinsomniac_____________________________
    Of all the things I've lost, I miss my mind the most.
    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

      Thanks, I never knew about that 3rd parameter. Perhaps you can suggest a patch to the documentation?

      gav^