I know that ... some other module might have easier way to do this. But for now, I want to learn and apply HTML::Parser and regex ...

Ok, so you're committed to drilling all those holes in your head just to prove to yourself for sure that drilling holes in your head is a bad idea. Here's one approach:

c:\@Work\Perl\monks>perl -wMstrict -le "use warnings; use strict; ;; use Regexp::Common; ;; use Data::Dump qw(dd); ;; my @lines = ( 'Summary</h1><table border=\"1\"><tr><th>Employee John Doe</th><th> +-0.82</th>', 'Summary</h1><table border=\"1\"><tr><th> Employee Fred D. Poe </th +><th> -5.03 </th>', 'Summary</h1><table border=\"1\"><tr><th>Employee Billy-Bob Toe</th +><th> </th>', 'Summary</h1><table border=\"1\"><tr><th>Employee</th><th>999</th>' +, '<th>Employee Prince </th><th> 123</th>', '<th>Employee O</th><th> 1.23 </th>', ); ;; my $rx_name = qr{ \S+? (?: \s+ \S+)*? }xms; my $rx_th_open = qr{ \s* < th > \s* }xms; my $rx_th_close = qr{ \s* < / th > \s* }xms; ;; my %per_employee; ;; LINE: for my $line (@lines) { my $parsed = my ($name, $amount) = $line =~ m{ $rx_th_open Employee \s+ ($rx_name) $rx_th_close $rx_th_open ($RE{num}{real})? $rx_th_close }xms; ;; if (not $parsed) { warn qq{'$line' failed to parse}; next LINE; } ;; $amount = 'no amount' unless defined $amount; $per_employee{$name} = $amount; } ;; dd \%per_employee; " 'Summary</h1><table border="1"><tr><th>Employee</th><th>999</th>' fail +ed to parse at -e line 1. { "Billy-Bob Toe" => "no amount", "Fred D. Poe" => "-5.03", "John Doe" => "-0.82", O => "1.23", Prince => 123, }
(Note that the  $rx_name regex for an actual, human name is very naive. (Update: See off-site Falsehoods Programmers Believe About Names.))

Update: Significant changes to example code:  $rx_th_open $rx_th_close regexes made more elegant (?); added rudimentary error handling; added corner and error test cases.


Give a man a fish:  <%-{-{-{-<


In reply to Re: HTML::Parser / Regex by AnomalousMonk
in thread HTML::Parser / Regex by MissPerl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.