in reply to Parsing Pegasus email boxes

For grins, I googled "pegasus mail", and found some pages at http://www.pmail.com/, including a pointer to a handful of tools for converting from Pegasus mail files to other forms (for some reason, the link is found here: http://kbase.pmail.gen.nz/pegasus.cfm). I didn't find anything useful about the Pegasus file format (but did find some indications that the format specs would not be open to public inspection).

Anyway, if any of those tools happens to work as a command-line utility (and is still available, up-to-date and within your budget), you could use it via a system call (or even a pipeline open() statement) to convert files as needed within your perl script.

If that doesn't pan out, and you're just trying to extract email address strings, perhaps others can recommend a good module. I just tried Mail::Address, and it seems to work well for the following case:

In other words, something like this (not tested on pegasus data):

use strict; use Mail::Address; my $hdrstring = ""; my $checknext = 0; open( M, "pegasus-mail.file" ) or die $!; while (<M>) { if ( /^(?:To|From|Cc|Bcc): (.*)/ ) { $checknext++; $hdrstring = $1; } elsif ( $checknext and /^\s+\S.*/ ) { $hdrstring .= $_; } else { $checknext = 0; if ( $hdrstring ) { my @addr = Mail::Address->parse( $hdrstring ); for my $a ( @addr ) { print $a->name, " <", $a->address, ">", $/; } $hdrstring = ""; } } }
update: Having seen Roger's non-module solution, I'd like to point out that multi-token addresses (like "My Name <me@home.net>" can often get split up by a line-break in the header. Also, you may sometimes see the form "me@home.net (My Name)". The usage shown above for Mail::Address handles both problems, and normalizes the latter case, so that "My Name" is returned by "$a->name()", just like in the former case.