in reply to HTML::Parser / Regex

I know that ... some other module might have easier way to do this. But for now, I want to learn and apply HTML::Parser and regex ...

Ok, so you're committed to drilling all those holes in your head just to prove to yourself for sure that drilling holes in your head is a bad idea. Here's one approach:

c:\@Work\Perl\monks>perl -wMstrict -le "use warnings; use strict; ;; use Regexp::Common; ;; use Data::Dump qw(dd); ;; my @lines = ( 'Summary</h1><table border=\"1\"><tr><th>Employee John Doe</th><th> +-0.82</th>', 'Summary</h1><table border=\"1\"><tr><th> Employee Fred D. Poe </th +><th> -5.03 </th>', 'Summary</h1><table border=\"1\"><tr><th>Employee Billy-Bob Toe</th +><th> </th>', 'Summary</h1><table border=\"1\"><tr><th>Employee</th><th>999</th>' +, '<th>Employee Prince </th><th> 123</th>', '<th>Employee O</th><th> 1.23 </th>', ); ;; my $rx_name = qr{ \S+? (?: \s+ \S+)*? }xms; my $rx_th_open = qr{ \s* < th > \s* }xms; my $rx_th_close = qr{ \s* < / th > \s* }xms; ;; my %per_employee; ;; LINE: for my $line (@lines) { my $parsed = my ($name, $amount) = $line =~ m{ $rx_th_open Employee \s+ ($rx_name) $rx_th_close $rx_th_open ($RE{num}{real})? $rx_th_close }xms; ;; if (not $parsed) { warn qq{'$line' failed to parse}; next LINE; } ;; $amount = 'no amount' unless defined $amount; $per_employee{$name} = $amount; } ;; dd \%per_employee; " 'Summary</h1><table border="1"><tr><th>Employee</th><th>999</th>' fail +ed to parse at -e line 1. { "Billy-Bob Toe" => "no amount", "Fred D. Poe" => "-5.03", "John Doe" => "-0.82", O => "1.23", Prince => 123, }
(Note that the  $rx_name regex for an actual, human name is very naive. (Update: See off-site Falsehoods Programmers Believe About Names.))

Update: Significant changes to example code:  $rx_th_open $rx_th_close regexes made more elegant (?); added rudimentary error handling; added corner and error test cases.


Give a man a fish:  <%-{-{-{-<