How do you specify the tags to textify? Here's the 411 from the manpages:

The text might span tags that should be textified. This is controlled by the $p->{textify} attribute, which is a hash that defines how certain tags can be treated as text. If the name of a start tag matches a key in this hash then this tag is converted to text. The hash value is used to specify which tag attribute to obtain the text from. If this tag attribute is missing, then the upper case name of the tag enclosed in brackets is returned, e.g. "IMG". The hash value can also be a subroutine reference. In this case the routine is called with the start tag token content as its argument and the return value is treated as the text.

The default $p->{textify} value is: {img => "alt", applet => "alt"}. This means that <IMG> and <APPLET> tags are treated as text, and that the text to substitute can be found in the ALT attribute.

Ok, so I'm using the following command to grab the text between the previous fetched tag and the next </table> tag:

my $text = $stream->get_text ("/table");
I want the script to ignore all <br> tags within the retreived text, but wipe out the rest of the HTML. After reading the above, the best option in my case is to use textify (HTML is naturally wiped out with Tokeparser). The question is, how do I specify the tags I want ignored?  $text->{textify}("br");? Can someone more familiar with this command set help me out?

Thankyas!


In reply to Tokeparser Textify Command by SpacemanSpiff

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.