Re: TokeParser

It seems as tho you're having the same problem with get_text as I was having. It appears that when you call the get_text method, it "massages" the contents of the text. for instance i had an item a link in it, and it removed the link and just gave me the plain text. In your case it's converting the   to something else (i got things like á2á (WinXP, ActivePerl 5.6)). Here's a pseudo work-around:

use HTML::TokeParser;
use strict;

local $/;
my $lines = <DATA>;
my $p = HTML::TokeParser->new(\$lines);
while (my $token = $p->get_token) {
    print "$1\n" if ($token->[1] =~ /^&nbsp;(\d{1,2})&nbsp;$/ && $toke
+n->[0] eq 'T')
}
__END__
<td>1</td>
<td>&nbsp;2&nbsp;</td>
<td>10</td>
<td>&nbsp;20&nbsp;</td>
[download]

Output:

2
20
[download]

~~Note: it will only work if you can guarantee that the data comes directly after the <td> tag (ie, no <div>, <p>, etc..)~~

HTH

Update: Better code, now. What i had was this: If you find a td tag, get the next tag, it should be text, and see if it matches the patern. Now, instead, I check to see if it's a text tag and if it matches our pattern. That should be more reliable.

Update 2: added use strict; :P

Update 2.5: code change thanks to Aristotle

--
Rock is dead. Long live paper and scissors!

Comment on Re: TokeParser Select or Download Code

Replies are listed 'Best First'.
Re^2: TokeParser by Aristotle (Chancellor) on Oct 26, 2002 at 22:03 UTC
Quick note: `local($/) = undef;` is the same as `local $/;` `:-)` Makeshifts last the longest.	[reply] [d/l] [select]