It seems as tho you're having the same problem with get_text as I was having. It appears that when you call the get_text method, it "massages" the contents of the text. for instance i had an item a link in it, and it removed the link and just gave me the plain text. In your case it's converting the
to something else (i got things like
á2á (WinXP, ActivePerl 5.6)). Here's a pseudo work-around:
use HTML::TokeParser;
use strict;
local $/;
my $lines = <DATA>;
my $p = HTML::TokeParser->new(\$lines);
while (my $token = $p->get_token) {
print "$1\n" if ($token->[1] =~ /^ (\d{1,2}) $/ && $toke
+n->[0] eq 'T')
}
__END__
<td>1</td>
<td> 2 </td>
<td>10</td>
<td> 20 </td>
Output:
2
20
Note: it will only work if you can guarantee that the data comes directly after the <td> tag (ie, no <div>, <p>, etc..)
HTH
Update: Better code, now. What i had was this: If you find a td tag, get the next tag, it should be text, and see if it matches the patern. Now, instead, I check to see if it's a text tag and if it matches our pattern. That should be more reliable.
Update 2: added
use strict; :P
Update 2.5: code change thanks to
Aristotle
--
Rock is dead. Long live paper and scissors!