Everyone is so much quicker than me. Oh well, here's my attempt for what it's worth, constructed to handle some (by no means all) HTML-legal variations in the text.
#!/usr/bin/perl -w use strict; $/ = ''; my %h; while (<DATA>){ while ( s/((\d+) is good.+?)<(?:hr|HR)>//s ){ my $good = $1; my $key = $2; $good =~ s/\n?\s?<(?:BR|br).?.?>\n?/|/g; my @pot = split /\|/, $good; shift @pot; $h{$key} = [@pot]; } } use Data::Dumper; print Data::Dumper->Dump([\%h],[qw(*h)]); __DATA__ <HR> 1 is good<BR> useless data<BR>useless data<BR> useless data <BR>useless data<BR> <hr> 2 is good<BR> useless data<br> useless data<BR> useless data<br> useless data<BR> <hr> 3 is not good <BR> useless data <br />useless data<br />useless data<BR> useless data<BR> <HR> 4 is good<BR> useless data<BR>useless data<BR> useless data<br>useless data<BR> <HR> 5 is not good <BR> useless data<BR> useless data<BR> useless data<BR> useless data<BR> <HR>
By the way, you asked your question very well and complete with a good data example. It's appreciated.
(better Data::Dumper, thanks to sacked and tilly).
mkmcconn
In reply to Re: parsing with regex
by mkmcconn
in thread parsing with regex
by 2501
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |