in reply to Repost of regex

Generally, it is unwise (at best) to not use a CPAN module when it exists and conforms to what you want to do. That being said... If you're trying to do a web spider, why don't you save yourself some time and use LWP::RobotUA along with use HTML::TokeParser or use HTML::Parser (my personal favorite) as other monks suggested?

Writing this task (extracting links to follow and analyzing the <META> tags) with HTML::Parser is a matter of a few lines. LWP already allows you to get the HTML. According to recipe 20.3 in The Perl Cookbook, you could also use HTML::LinkExtor to extract the links as this code shows (copied verbatim):

use HTML::LinkExtor; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse_file($filename); @links = $parser->links; foreach $linkarray (@links) { my @element = @$linkarray; my $elt_type = shift @element; # element type # possibly test whether this is an element we're interested in while (@element) { # extract the next attribute and its value my ($attr_name, $attr_value) = splice(@element, 0, 2); # ... do something with them ... } }

However, in any case I've needed to do this, I also have needed to parse the HTML, so I've always used HTML::Parser for that too.

Hope this all helps a bit. Feel free to ask further questions.

Best regards

-lem, but some call me fokat