Re: parsing with regex

in reply to parsing with regex

Everyone is so much quicker than me. Oh well, here's my attempt for what it's worth, constructed to handle some (by no means all) HTML-legal variations in the text.

#!/usr/bin/perl -w
use strict;

$/  = '';

my %h;

while (<DATA>){
  while ( s/((\d+) is good.+?)<(?:hr|HR)>//s ){
    my $good  = $1;
    my $key   = $2;
    $good     =~ s/\n?\s?<(?:BR|br).?.?>\n?/|/g;
    my @pot   = split /\|/, $good;
    shift @pot;
    $h{$key}  = [@pot];
  }

}

use Data::Dumper;
print Data::Dumper->Dump([\%h],[qw(*h)]);

__DATA__
<HR>
1 is good<BR>
useless data<BR>useless data<BR>
useless data
<BR>useless data<BR>
<hr>
2 is good<BR>
useless data<br>
useless data<BR>
useless data<br>
useless data<BR>
<hr>
3 is not good <BR>
useless data
  <br />useless data<br />useless data<BR>
useless data<BR>
<HR>
4 is good<BR>
useless data<BR>useless data<BR>
useless data<br>useless data<BR>
<HR>
5 is not good <BR>
useless data<BR>
useless data<BR>
useless data<BR>
useless data<BR>
<HR>
[download]

By the way, you asked your question very well and complete with a good data example. It's appreciated.
(better Data::Dumper, thanks to sacked and tilly).
mkmcconn

Comment on Re: parsing with regex Download Code

In Section Seekers of Perl Wisdom