SpacemanSpiff has asked for the wisdom of the Perl Monks concerning the following question:
The text might span tags that should be textified. This is controlled by the $p->{textify} attribute, which is a hash that defines how certain tags can be treated as text. If the name of a start tag matches a key in this hash then this tag is converted to text. The hash value is used to specify which tag attribute to obtain the text from. If this tag attribute is missing, then the upper case name of the tag enclosed in brackets is returned, e.g. "IMG". The hash value can also be a subroutine reference. In this case the routine is called with the start tag token content as its argument and the return value is treated as the text.
The default $p->{textify} value is: {img => "alt", applet => "alt"}. This means that <IMG> and <APPLET> tags are treated as text, and that the text to substitute can be found in the ALT attribute.
Ok, so I'm using the following command to grab the text between the previous fetched tag and the next </table> tag:
I want the script to ignore all <br> tags within the retreived text, but wipe out the rest of the HTML. After reading the above, the best option in my case is to use textify (HTML is naturally wiped out with Tokeparser). The question is, how do I specify the tags I want ignored? $text->{textify}("br");? Can someone more familiar with this command set help me out?my $text = $stream->get_text ("/table");
Thankyas!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Tokeparser Textify Command
by Aristotle (Chancellor) on Nov 10, 2005 at 04:15 UTC | |
by SpacemanSpiff (Sexton) on Nov 10, 2005 at 05:52 UTC | |
by Aristotle (Chancellor) on Nov 10, 2005 at 06:15 UTC | |
by SpacemanSpiff (Sexton) on Nov 10, 2005 at 07:25 UTC | |
by PodMaster (Abbot) on Nov 10, 2005 at 09:09 UTC |