fair enough, i didn't articulate my problem thouroughly. here's an example of the HTML i'm reading into the $text variable:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130
+1-30062000">Steve,</SPAN></FONT></DIV>
<DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130
+1-30062000"></SPAN></FONT> </DIV>
<DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130
+1-30062000">The picture was one that you pointed me at in the paper a
+ couple of weeks ago. I don't have any pictures of mine yet.</SPAN></
+FONT></DIV>
<DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130
+1-30062000"></SPAN></FONT> </DIV>
<DIV><FONT color="#0000ff" face="Arial" size="2"><SPAN class="54005130
+1-30062000">Tom</SPAN></FONT></DIV>
<BLOCKQUOTE>
<DIV align="left" class="OutlookMessageHeader" dir="ltr"><FONT face="T
+ahoma" size="2">-----Message-----<BR><B>From:</B> eat@joes.com [mailt
+o:eat@joes.com]<BR><B>Sent:</B> Monday, June 10, 2005 3:50 PM<BR><B>T
+o:</B> google.com<BR><B>Subject:</B> Re: [Test] another test<BR><BR><
+/DIV></FONT><TT>Tom wrote:<BR>>OK, I finally figured out that you
+can post online at the website or just<BR>>send an e-mail.<BR><BR>
+Oh and the pic... it looks like it was shot during an<BR>earthquake.&
+nbsp; :-)<BR><BR>Steve<BR></TT><TT>To unsubscribe from this group, se
+nd an email to:<BR>listmod@google.com<BR><BR></TT><BR></BLOCKQUOTE>
<br><br> </div>
</td></tr></table>
to do that, i use the following line in my script:
my $text = $stream->get_text ("/table");
this returns the following printed later in the script:
Steve,
The picture was one that you pointed me at in the paper a couple of we
+eks ago. I don't have any pictures of mine yet.
Tom
-----Message-----From:eat@joes.com [mailto:eat@joes.com] Sent:Monday,
+June 10, 2005 3:50 PM To: google.com Subject: Re: [Test] another test
+ Tom wrote: OK, I finally figured out that you can post online at the
+ website or just send an e-mail. Oh and the pic... it looks like it w
+as shot during an earthquake. :-) Steve To unsubscribe from this grou
+p, send an email to: listmod@xxxxx.com
all of the HTML is stripped by nature of the operation, and that's great. i'm looking to keep the BR tags, however, so when i reimport the data elsewhere, it retains the formatting of the original (notice how the text at the bottom is all smashed together with no line breaks).
so what i meant earlier by no avail, i meant i was still getting the text all squashed together as above.
hope that made a little more sense. |