in reply to Re: HTML::Parser question
in thread HTML::Parser question
to read as follows:$self->{TEXT}.=$text;
I tried your code with this mod, and the result might still not be exactly what you wanted (I saw "nbsp", HTML comments, other "funny character" entities (©, , etc.) -- I think you'll find a way to handle these with HTML::Entities; also, depending on how far you want to go with filtering the yahoo page content to get rid of irrelevant stuff (like the comments, the scripting, the forms, etc), you might get good mileage out of HTML::TokeParser or it's ::Simple variant (same functionality, different API).$self->{TEXT}.="$text\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: HTML::Parser question
by mkurtis (Scribe) on Mar 09, 2004 at 00:54 UTC |