imo this is much easier and more reliable than trying to build, debug and maintain a complex regex on something as loose as HTML.
It's also easy to adapt if the spec changes (and doesn't it always) or to other parsing tasks as they arise.
I believe using a parser (and there are many to suit all tastes) is a big win on all counts and I highly recommend it.
output:#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use HTML::TokeParser::Simple; my $html; { local $/; $html = <DATA> } my $p = HTML::TokeParser::Simple->new(\$html); $p->unbroken_text(1); my ($in_li, @record, @db); while (my $t = $p->get_token){ $in_li++, next if $t->is_start_tag('li'); next unless $in_li; if ($t->is_end_tag('li')){ push @db, [@record]; $in_li = 0; next; } if ($t->is_start_tag('a')){ push @record, $t->get_attr('href'); my $text = $p->get_trimmed_text('/a'); push @record, $text; } } #die Dumper \@db; # the text inside the first link's text, the 2nd link's URL, the 2nd l +ink's text. for my $record (@db){ my @field = @{$record}; print $field[1], "::", $field[2], "::", $field[3], "\n"; } __DATA__ <li> <a class="style5" href="http://www.site.com/page.html"> some words here </a> - <a class="style3" href="http://www.site.com/page2.html"> "some words here" </a> </li>
---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" _new.pl some words here::http://www.site.com/page2.html::"some words here" > Terminated with exit code 0.
In reply to Re: 3 capture multi line regex
by wfsp
in thread 3 capture multi line regex
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |