spiritway has asked for the wisdom of the Perl Monks concerning the following question:

Beloved Brethren and Sistren:

I am stymied over what I would think is a simple matter. I am working on an e-mail program, using the Net::POP3 module. This module returns a pointer to an array of text that contains the entire message, both headers and body of the message. I am trying to figure out a way to get rid of the header information and preserve the body of the text.

I checked the documentation of Net::POP3, and it doesn't appear to have a way that I can simply ask for just the body text. I checked RFC 1939 and RFC 2822, and found that headers have the form <HeaderID> : <Header Text>. This is a big help, but not enough. Headers can span more than one line. When they do this, the subsequent lines begin with a whitespace.

I was trying to reconstruct the headers that were split into more than one line by finding newline/whitespace combinations, and replacing them with a space. This turned out to be quite difficult for me. First I tried:

$line =~ s/\n\s+/ /mg;

That doesn't work. I think the problem is that each line ends with a newline. There are no lines that contain a newline followed by another character.

I then reasoned that, since the newline is followed by whitespace, I might be able to search for lines that begin with whitespace (and simply remove those lines) using

$line =~ s/^\s+.*/ /g;

Unfortunately, this finds any such lines, including those contained in the body text. I want to preserve all of the text, and simply remove (or at least identify) the header lines.

So the question is, how can I reconstruct the headers that span more than one line? Or, equally useful, how can I distinguish headers from body text?

UPDATE: Thanks to all of you who have commented. You've given me some great ideas to try. I appreciate it.

Replies are listed 'Best First'.
Re: Removing Headers from E-mail Messages.
by ysth (Canon) on Jan 29, 2006 at 10:55 UTC
    The headers are divided from the body by a empty line (an array element containing just "\n").

    $msg = $pop3->get($msgnum); # strip header 1 while shift(@$msg) ne "\n";
    But, depending on what all you are doing with the message, you may want to use a module to parse the message:
    $msglines = $pop3->get($msgnum) or ...; $msg = Mail::Internet->new($msglines) or ...; $body = $msg->body; for my $linenum (1..@$body) { print "Body line $linenum: $body->[$linenum]\n"; }
Re: Removing Headers from E-mail Messages.
by atcroft (Abbot) on Jan 29, 2006 at 10:57 UTC

    My first thought was to suggest Email:Simple, from which you could do something like:

    my $mail = Email::Simple->new($msg); my (%headers_i_care_about); foreach my $i_care_about (qw(Received Date From To Subject Return-Path)) { my @{$headers_i_care_about{$i_care_about}} = $mail->header($i_care_about); # or, if you want it as a single string, something like: # my $headers_i_care_about{$i_care_about} = # join(' ', $mail->header($i_care_about)); }

    If that fails to work as you desire, then what you could fall back to is to look at the message line-by-line until you find the first completely blank line (which indicates the end of the headers, if memory serves), remembering the last line that did not begin with whitespace, and concatenating the current line with the previous one if it did, possibly on the order of:

    my (%headers); my ($lastheader); foreach my $line (@msg) { last if ($line =~ m/^\s*$/); # Reached end of headers if ($line =~ m/^\s+(.+)/) { $headers{$lastheader} .= $1; } else { my @parts = split(/:/, $line, 2); $headers{$parts[0]} = $parts[1]; $lastheader = $parts[0]; } }

    Hope that helps.

Re: Removing Headers from E-mail Messages.
by g0n (Priest) on Jan 29, 2006 at 10:57 UTC
    You may find Net::POP3Client easier.

    use Net::POP3Client; my $pop= new Mail::POP3Client ( USER => "myuser", PASSWORD => "mypass", HOST => "127.0.0.1"); for (my $i=1;$i <= $pop->Count;$i++) { my $head = $pop->Head($i); my $body = $pop->Body($i); }

    (Code untested).

    --------------------------------------------------------------

    "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."

    John Brunner, "The Shockwave Rider".