in reply to Tokeparser Textify Command

Seems like you need

$text->{textify} = { br => '' };
This is nothing specific to the textify feature, it’s simply an anonymous hash. See perlreftut.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^2: Tokeparser Textify Command
by SpacemanSpiff (Sexton) on Nov 10, 2005 at 05:52 UTC
    wouldn't $text->{textify} = { br => '' }; swap br tags with a blank space? i've tried that as well as a few variations (like putting br in the single quotes or just having br alone in the curly brackets), all to no avail.

      The documentation you just quoted clearly states that the value of the key specifies which attribute the replacement text should be taken from (so by default, it replaces <img> tags by the text in their alt attribute). Since you want no replacement text, an empty string should be the appropriate choice. (Not that it matters, since <br> has no attributes to pick replacement text out of.)

      You could be more specific about “no avail” – what is happening and how does it contradict your expectations?

      Makeshifts last the longest.

        fair enough, i didn't articulate my problem thouroughly. here's an example of the HTML i'm reading into the $text variable:

        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">Steve,</SPAN></FONT></DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000"></SPAN></FONT>&nbsp;</DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">The picture was one that you pointed me at in the paper a + couple of weeks ago. I don't have any pictures of mine yet.</SPAN></ +FONT></DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000"></SPAN></FONT>&nbsp;</DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">Tom</SPAN></FONT></DIV> <BLOCKQUOTE> <DIV align="left" class="OutlookMessageHeader" dir="ltr"><FONT face="T +ahoma" size="2">-----Message-----<BR><B>From:</B> eat@joes.com [mailt +o:eat@joes.com]<BR><B>Sent:</B> Monday, June 10, 2005 3:50 PM<BR><B>T +o:</B> google.com<BR><B>Subject:</B> Re: [Test] another test<BR><BR>< +/DIV></FONT><TT>Tom wrote:<BR>&gt;OK, I finally figured out that you +can post online at the website or just<BR>&gt;send an e-mail.<BR><BR> +Oh and the pic... it looks like it was shot during an<BR>earthquake.& +nbsp; :-)<BR><BR>Steve<BR></TT><TT>To unsubscribe from this group, se +nd an email to:<BR>listmod@google.com<BR><BR></TT><BR></BLOCKQUOTE> <br><br> </div> </td></tr></table>

        to do that, i use the following line in my script:

        my $text = $stream->get_text ("/table");

        this returns the following printed later in the script:

        Steve, The picture was one that you pointed me at in the paper a couple of we +eks ago. I don't have any pictures of mine yet. Tom -----Message-----From:eat@joes.com [mailto:eat@joes.com] Sent:Monday, +June 10, 2005 3:50 PM To: google.com Subject: Re: [Test] another test + Tom wrote: OK, I finally figured out that you can post online at the + website or just send an e-mail. Oh and the pic... it looks like it w +as shot during an earthquake. :-) Steve To unsubscribe from this grou +p, send an email to: listmod@xxxxx.com

        all of the HTML is stripped by nature of the operation, and that's great. i'm looking to keep the BR tags, however, so when i reimport the data elsewhere, it retains the formatting of the original (notice how the text at the bottom is all smashed together with no line breaks).

        so what i meant earlier by no avail, i meant i was still getting the text all squashed together as above.

        hope that made a little more sense.