HTTP-404 has asked for the wisdom of the Perl Monks concerning the following question:

Dear friends i'm subscribed to 4 mail lists i get about 500 msg a day, i'm doing archiving but all those mails come with signature it would save a lot of space if remove it it looks like this
------------------------------- Know thyself? Absurd direction! Bubbles bear no introspection. -Khushhal Khan Khatak <code> -- PHP Windows Mailing List (http://www.php.net/) ----------------------------
i want to cut off this
-- PHP Windows Mailing List (http://www.php.net/) part
Perl is best way to do this Thank You very much for your help

Replies are listed 'Best First'.
Re: Removing Signature from Email
by Wodin (Acolyte) on Apr 08, 2001 at 13:30 UTC

    This response assumes the portion you want to remove is always on the line immediately after the signature designator.

    ## Assumes that name of file is in $messagename. Could ## easily be put in a sub and used that way. open (MESSAGE, $messagename) or die "Couldn't open email:$!\n"; while (<MESSAGE>){ # Test for the signature line if (/^--$/) { #get rid of the next line by reading it in. <MESSAGE>; } else { print $_; } } close MESSAGE or die "Couldn't close message:$!\n";

    This may not be the most efficient code in the world, but it will get the job done -- it would be used by taking an input file and piping to an output file.

    It's possible that someone can come up with a better solution by modifying $/ and then using a substitution to find that particular block that you wish to rid yourself of, but I'm simply not clever enough to do that tonight

      this is the code i have that's doesn't work it downloads all mail and then saves as XML file
      use Mail::POP3Client; $pop = new Mail::POP3Client( USER => "*****", PASSWORD => "*****", HOST => "127.0.0.1" ); open(OUTPUT,">mail.xml") or die; $id=0; for( $i = 1; $i <= $pop->Count(); $i++ ) { $id++; print OUTPUT "<message>\n"; $header=$pop->Head( $i ); $body=$pop->Body($i); $header =~ m#^From: (.*?)$#m; $from=$1; $header =~ m#^To: (.*?)$#m; $to=$1; $header =~ m#^Subject: (.*?)$#m; $subject=$1; $header =~ m#^Date: (.*?)$#m; $date=$1; #print "$header\n"; print OUTPUT "<from>$from</from>\n"; print OUTPUT "<to>$to</to>\n"; print OUTPUT "<subject>$subject</subject>\n"; print OUTPUT "<date>$date</date>\n"; print "Msg ID: $id\n"; #have to edit here if ($body=~/^--$/) { next; } else { print OUTPUT "<body>\n$body\n<\body>\n"; } #ends print OUTPUT "</message>\n"; } $pop->Close();
      could any1 help me plz

        Looking at the docs for Mail::POP3Client, I think you want to get the body as an array of lines and then iterate over that array, keeping only those lines which are not the signature and the line immediately following it. Right now, you're saying  $body = $pop->Body($i); which would be in a scalar context and force Mail::POP3Client to return the body as one big string.

        Hopefully, the following code will clear up your problem. Be warned that this is untested.

        my @lines = $pop->Body($i); ## Some code passes, not much, not a little, what am I, a watch? ## iterate over @lines, checking if we want to keep it. print OUTPUT "<body>\n"; foreach $line (@lines) { if ($line =~ /^--$/) { # Everything else is signature, # so we break the foreach loop last; } else { print OUTPUT $line; } } print OUTPUT "\n<\body>\n";
        Hope this works.
Re: Removing Signature from Email
by mirod (Canon) on Apr 08, 2001 at 14:20 UTC

    Here is a piece of code that will remove the signature (and only the signature!) from a message, provided the signature is stored in the __DATA__ part of the script:

    #/bin/perl -w use strict; # read the signatures $/="\n\n"; my @sig= <DATA>; chomp @sig; # chomp every element, so we get rid of the extra \ +n undef $/; my $message= <>; # read the message foreach my $sig (@sig) { # \Q \E quotes the meta-characters in $sig # \s* skips trailing spaces or \n # and last just avoids unnecessary extra matches $message=~ s{\Q$sig\E\s*}{}s and last; } print $message; __DATA__ _______________________________________________ Perl-XML mailing list Perl-XML@listserv.ActiveState.com http://listserv.ActiveState.com/mailman/listinfo/perl-xml -- PHP Windows Mailing List (http://www.php.net/) ----------------------------
Re: Removing Signature from Email
by Masem (Monsignor) on Apr 08, 2001 at 16:59 UTC
    Another solution, which may applied for all those messages that appear NOT to follow the typical /^--/ signature indicator is to read lines from the back of the message up, until you get to a white space line. Then if the last read line (eg the line after the white space) starts with any repeated symbols, such as - or *, it's a good bet that it's a signature. This will err on the side of keeping anything that doesn't look like a sig, even though it might be.
    my @message = (<MESSAGE>); my @possible_sig; my $line = pop @message; # Read past white space that may trail message while ( $line =~ /^\s*$/ ) { $line = pop @message; } # Read in what may be the sig... while ( $line !~ /^\s*$/ ) { unshift @possible_sig, $line; $line = pop @message; } # at this point, @message contains the original message, # minus the last 'paragraph' minus trailing whitespace. # @possible_sig contains this last paragraph. # If the first element of @possible_sig does look # like a standard sig, then we just drop it, and @message # is ready to go. Otherwise, just reattach it. if ( $possible_sig[ 0 ] !~ /^[-\*%]{2,}/ ) { push @message, $line; # (put that whitespace back!) push @message, @possible_sig; } # @message can now be written to file.
    update: er, code was right, comment was wrong. @possible_sig should be dumped if it looks like a sig.
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
Re: Removing Signature from Email
by geektron (Curate) on Apr 09, 2001 at 05:50 UTC
    read up on, and use, Mail::Internet ( in the Mailtools bundle ). there is a utility method to remove signatures ( aptly name, remove_sig ).

    write your processing script, and set up a way to feed messages into it. whether that's a procmail filter, and .forward file, or an entry in /etc/aliases ( all *NIX-based ).