I like this approach!, but alas it didn't recognize any text or html parts in my message.

#=========================================================== # Program EM.pl #!/usr/bin/perl -w #use strict; use Email::MIME; use HTML::TokeParser; use Data::Dumper; my $msgfile = "Andrew.msg"; # A test message file from MS Outlook open (MSG, "$msgfile") or die "Can't open $msgfile: $!\n"; my $message = do { local $/; <MSG> }; # $/=undef; my $e=<FH>; close(MSG); my $parsed = Email::MIME->new($message) or die "Could not parse email message: $!"; #$message is full text of entire email message foreach my $part ($parsed->parts) { if ($part->content_type =~ /text\/plain/i) { #You have a plain text part: do stuff here with $part->body print $part->body; } elsif ($part->content_type =~ /image\/jpeg/i) { #You have a JPEG part in $part->body } elsif ($part->content_type =~ /text\/html/i) { #You have an HTML part in part body my $html = $part->body; my $plain_text; my $parsed_text = HTML::TokeParser->new(\$html) or die "Cannot read message text for parsing and cleaning: $! +"; while (my $token = $parsed_text->get_token) { ´ if ($token->[0] eq 'T') { $plain_text .= $token->[1];} # text } #Do stuff with $plain_text extracted from HTML here print $plain_text; } else { print "NO MATCH\n"; foreach (keys %$part) { ${%$part}{$_} =~ s/\W*//g; } # Zap non-w +ord print Data::Dumper->Dump( [%$part] ); # for test outpu +t } } #=========================================================== C:\Perl\Test\MIME>perl -w EM.pl NO MATCH $VAR1 = 'body'; $VAR2 = 'PPPBYahooGroupsLinksBBRPULLITovisityourgrouponthewebgotoBRAhrefhttpgr +oupsyahoocomgrouphaiku kaiIIIhttpgroupsyahoocomgrouphaikukaiIIIABRnbspLITounsubscribefromthis +groupsendanemailtoBRAhref <cut... a lot more lines of this stuff> mailt stg10_5FF70102DFA__prope +rties_version100X99q___Nd_Ad0'; $VAR3 = 'head'; $VAR4 = 'HASH0x1625814'; $VAR5 = 'mycrlf'; $VAR6 = ''; $VAR7 = 'header_names'; $VAR8 = 'HASH0x1c60278'; $VAR9 = 'order'; $VAR10 = 'ARRAY0x1af5550'; $VAR11 = 'parts'; $VAR12 = 'ARRAY0x16259ac'; $VAR13 = 'ct'; $VAR14 = 'HASH0x1aa33e4'; C:\Perl\Test\MIME> #===========================================================
My conclusion is, that there's probably no simple :
my $text = msg_clean($email);
function out there, and i'll have to do a top-down parsing of the MIME object to get at the part of the email, that interests me (As an alternative to the simple brute force regex filtering, that i'm using right now. The latter approach works ok as long as the text is enclosed in proper tags, but it easily breaks, if it isn't)

This is basically what several of you (actually all of you) have tried to tell me, but i didn't quite want to give up on my laziness up front... A full parsing of the email is more work, but also more robust and surely in the long run will allow me to be lazy at at higher level...

So i think i'll start digging into the MIME::Tools
thanks for your patience!
Best regards
-- Allan

In reply to Re^2: Extracting TEXT from email by ady
in thread Extracting TEXT from email by ady

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.