in reply to Regex isn't performing like I think it should
Not long ago, i asked a question about Text::Balanced, and recieved some really great help, even feedback from the author of the module....
here's a copy of my code snippet from that page, it looks for all of the 'substrings' that are wrapped in the 'tags' (which can be any arbitrary text, and need not be the same for beginning and ending tags) and strips them out....
this might be similar to what you're trying to do.
good luck!# find all the URLs from the page contents, rejecting any from bianca @data = extract_multiple( $response->content, [ sub {extract_tagged($_[0], '<a href="http://', '</a>', undef, {reject => ['bianca.com']} ) } ], undef, 1); # loop thru and strip the URL to it's bare address, this is # what's needed to insert into the database for (my $i=0; $i<=$#data; $i++) { my @temp = extract_tagged($data[$i], '<a href="http://', '">', und +ef, undef); $data[$i] = $temp[4]; }
|
|---|