in reply to Parsing Pegasus email boxes
Anyway, if any of those tools happens to work as a command-line utility (and is still available, up-to-date and within your budget), you could use it via a system call (or even a pipeline open() statement) to convert files as needed within your perl script.
If that doesn't pan out, and you're just trying to extract email address strings, perhaps others can recommend a good module. I just tried Mail::Address, and it seems to work well for the following case:
In other words, something like this (not tested on pegasus data):
update: Having seen Roger's non-module solution, I'd like to point out that multi-token addresses (like "My Name <me@home.net>" can often get split up by a line-break in the header. Also, you may sometimes see the form "me@home.net (My Name)". The usage shown above for Mail::Address handles both problems, and normalizes the latter case, so that "My Name" is returned by "$a->name()", just like in the former case.use strict; use Mail::Address; my $hdrstring = ""; my $checknext = 0; open( M, "pegasus-mail.file" ) or die $!; while (<M>) { if ( /^(?:To|From|Cc|Bcc): (.*)/ ) { $checknext++; $hdrstring = $1; } elsif ( $checknext and /^\s+\S.*/ ) { $hdrstring .= $_; } else { $checknext = 0; if ( $hdrstring ) { my @addr = Mail::Address->parse( $hdrstring ); for my $a ( @addr ) { print $a->name, " <", $a->address, ">", $/; } $hdrstring = ""; } } }
|
|---|