in reply to Re^2: Tokeparser Textify Command
in thread Tokeparser Textify Command

The documentation you just quoted clearly states that the value of the key specifies which attribute the replacement text should be taken from (so by default, it replaces <img> tags by the text in their alt attribute). Since you want no replacement text, an empty string should be the appropriate choice. (Not that it matters, since <br> has no attributes to pick replacement text out of.)

You could be more specific about “no avail” – what is happening and how does it contradict your expectations?

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^4: Tokeparser Textify Command
by SpacemanSpiff (Sexton) on Nov 10, 2005 at 07:25 UTC
    fair enough, i didn't articulate my problem thouroughly. here's an example of the HTML i'm reading into the $text variable:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">Steve,</SPAN></FONT></DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000"></SPAN></FONT>&nbsp;</DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">The picture was one that you pointed me at in the paper a + couple of weeks ago. I don't have any pictures of mine yet.</SPAN></ +FONT></DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000"></SPAN></FONT>&nbsp;</DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">Tom</SPAN></FONT></DIV> <BLOCKQUOTE> <DIV align="left" class="OutlookMessageHeader" dir="ltr"><FONT face="T +ahoma" size="2">-----Message-----<BR><B>From:</B> eat@joes.com [mailt +o:eat@joes.com]<BR><B>Sent:</B> Monday, June 10, 2005 3:50 PM<BR><B>T +o:</B> google.com<BR><B>Subject:</B> Re: [Test] another test<BR><BR>< +/DIV></FONT><TT>Tom wrote:<BR>&gt;OK, I finally figured out that you +can post online at the website or just<BR>&gt;send an e-mail.<BR><BR> +Oh and the pic... it looks like it was shot during an<BR>earthquake.& +nbsp; :-)<BR><BR>Steve<BR></TT><TT>To unsubscribe from this group, se +nd an email to:<BR>listmod@google.com<BR><BR></TT><BR></BLOCKQUOTE> <br><br> </div> </td></tr></table>

    to do that, i use the following line in my script:

    my $text = $stream->get_text ("/table");

    this returns the following printed later in the script:

    Steve, The picture was one that you pointed me at in the paper a couple of we +eks ago. I don't have any pictures of mine yet. Tom -----Message-----From:eat@joes.com [mailto:eat@joes.com] Sent:Monday, +June 10, 2005 3:50 PM To: google.com Subject: Re: [Test] another test + Tom wrote: OK, I finally figured out that you can post online at the + website or just send an e-mail. Oh and the pic... it looks like it w +as shot during an earthquake. :-) Steve To unsubscribe from this grou +p, send an email to: listmod@xxxxx.com

    all of the HTML is stripped by nature of the operation, and that's great. i'm looking to keep the BR tags, however, so when i reimport the data elsewhere, it retains the formatting of the original (notice how the text at the bottom is all smashed together with no line breaks).

    so what i meant earlier by no avail, i meant i was still getting the text all squashed together as above.

    hope that made a little more sense.

      If that's the output you're getting, you're not using textify. Aristotle confused $text with $stream from your code, but you shouldn't.
      use strict; use warnings; my $html =<<'__BOB__'; <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">Steve,</SPAN></FONT></DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000"></SPAN></FONT>&nbsp;</DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">The picture was one that you pointed me at in the paper a + couple of weeks ago. I don't have any pictures of mine yet.</SPAN></ +FONT></DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000"></SPAN></FONT>&nbsp;</DIV> <DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130 +1-30062000">Tom</SPAN></FONT></DIV> <BLOCKQUOTE> <DIV align="left" class="OutlookMessageHeader" dir="ltr"><FONT face="T +ahoma" size="2">-----Message-----<BR><B>From:</B> eat@joes.com [mailt +o:eat@joes.com]<BR><B>Sent:</B> Monday, June 10, 2005 3:50 PM<BR><B>T +o:</B> google.com<BR><B>Subject:</B> Re: [Test] another test<BR><BR>< +/DIV></FONT><TT>Tom wrote:<BR>&gt;OK, I finally figured out that you +can post online at the website or just<BR>&gt;send an e-mail.<BR><BR> +Oh and the pic... it looks like it was shot during an<BR>earthquake.& +nbsp; :-)<BR><BR>Steve<BR></TT><TT>To unsubscribe from this group, se +nd an email to:<BR>listmod@google.com<BR><BR></TT><BR></BLOCKQUOTE> <br><br> </div> </td></tr></table> __BOB__ use HTML::TokeParser; { my $stream = HTML::TokeParser->new( \$html ); $stream->{textify} = { br => '' }; my $text = $stream->get_text ("/table"); warn $text; } { my $stream = HTML::TokeParser->new( \$html ); $stream->{textify} = { br => sub { my $t = \@_; if( $t->[0] eq 'S' and $t->[1] eq 'br') { return '<br>'; } return; } }; my $text = $stream->get_text ("/table"); warn $text; } __END__ Steve,   The picture was one that you pointed me at in the paper a couple of w +eeks ago. I don't have any pictures of mine yet.   Tom -----Message-----[BR]From: eat@joes.com [mailto:eat@joes.com][BR]Sent +: Monday, June 10, 2005 3:50 PM[BR]To: google.com[BR]Subject: Re: [Te +st] another test[BR][BR] Tom wrote:[BR]>OK, I finally figured out tha +t you can post online at the website or just[BR]>send an e-mail.[BR][ +BR]Oh and the pic... it looks like it was shot during an[BR]earthquak +e.  :-)[BR][BR]Steve[BR]To unsubscribe from this group, send an email + to:[BR]listmod@google.com[BR][BR][BR] [BR][BR] at html.tokeparser.textify.pl line 27. Steve,   The picture was one that you pointed me at in the paper a couple of w +eeks ago. I don't have any pictures of mine yet.   Tom -----Message-----<br>From: eat@joes.com [mailto:eat@joes.com]<br>Sent +: Monday, June 10, 2005 3:50 PM<br>To: google.com<br>Subject: Re: [Te +st] another test<br><br> Tom wrote:<br>>OK, I finally figured out tha +t you can post online at the website or just<br>>send an e-mail.<br>< +br>Oh and the pic... it looks like it was shot during an<br>earthquak +e.  :-)<br><br>Steve<br>To unsubscribe from this group, send an email + to:<br>listmod@google.com<br><br><br> <br><br> at html.tokeparser.textify.pl line 46.

      MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
      I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
      ** The third rule of perl club is a statement of fact: pod is sexy.