in reply to Who wants to help me adjust LinkExtor::Simple?

Carrying on with my recent fetish for HTML::Parser, I provide this. It will grab the 'src', 'href', 'background' and any other attributes you wish from any HTML element. This could be changed very easily to limit which tag elements are recognized, etc. Enjoy.

#!perl -w use strict; use HTML::Parser; # list of html attributes which contain UR[IL]s my @ATTR = qw( src href background ); my $parser = HTML::Parser->new( start_h => [ \&parser_tag, 'self, attr' ] ); $parser->parse_file( *DATA ); print join($/, @{ $parser->{_links} }), $/; sub parser_tag { my ($self, $attr) = @_; while ( my ($attr_n, $attr_v) = each %$attr ) { next unless grep $_ eq $attr_n, @ATTR; push @{ $self->{_links} }, $attr_v; } } __DATA__ <html> <head> <title>Sample Page</title> <script language="JavaScript" src="/foo.js"></script> <style type="text/css" src="/bar.css"></style> </head> <body background="/qux.jpg"> <a href="/some-link"><img src="/some-image" /></a> </body> </html>