Re: Regex isn't performing like I think it should

If the goal here is to extract substrings from the long url, and they are wrapped in a consistent set of 'tags' (not neccesarily html tags), then maybe you could look at Text::Balanced

Not long ago, i asked a question about Text::Balanced, and recieved some really great help, even feedback from the author of the module....

here's a copy of my code snippet from that page, it looks for all of the 'substrings' that are wrapped in the 'tags' (which can be any arbitrary text, and need not be the same for beginning and ending tags) and strips them out....
this might be similar to what you're trying to do.


# find all the URLs from the page contents, rejecting any from bianca
@data = extract_multiple( $response->content, 
                [ sub {extract_tagged($_[0], 
                '<a href="http://', '</a>', 
                undef, 
                {reject => ['bianca.com']} ) } ], 
                                    undef, 1);

# loop thru and strip the URL to it's bare address, this is
# what's needed to insert into the database
for (my $i=0; $i<=$#data; $i++) {
    my @temp = extract_tagged($data[$i], '<a href="http://', '">', und
+ef, undef);
    $data[$i] = $temp[4];
}
[download]

good luck!

Comment on Re: Regex isn't performing like I think it should Download Code