In Parsing email files, what are the best modules ? , I stated my objectives for a small project. Essentially I need to make sure my email distribution lists are up to date. I'm using Pegasus mail as my email client, and the perl I'm using is ActiveState 5.8.0.806 on a Win box.
Thanks to the code from Roger , in reply to the question, two out of the three objectives are working just fine. :)
The third part, addressed in
sub parse_mail_folder
basically won't work because the 'mailboxes' are not of a Unix format. I couldn't work out why the code wouldn't drop into the 'foreach' loop (in _that_ sub), because it was doing steps 1 and 2 perfectly, and setting up the path/name to the mailboxes okay. Then having a look at one of the examples in the "MailBox" distribution
#!/usr/bin/perl # Demonstration on how to use the manager to open folders, and then # to print the headers of each message. # # This code can be used and modified without restriction. # Mark Overmeer, <mailbox@overmeer.net>, 9 nov 2001 use warnings; use strict; use lib '..', '.'; use Mail::Box::Manager 2.00; # # Get the command line arguments. # die "Usage: $0 folderfile\n" unless @ARGV==1; my $filename = shift @ARGV; # # Open the folder # my $mgr = Mail::Box::Manager->new; my $folder = $mgr->open ( $filename , extract => 'LAZY' # never take the body unless needed ); # which saves memory and time. die "Cannot open $filename: $!\n" unless defined $folder; # # List all messages in this folder. # my @messages = $folder->messages; print "Mail folder $filename contains ", scalar @messages, " messages: +\n"; my $counter = 1; foreach my $message (@messages) { printf "%3d. ", $counter++; print $message->get('Subject') || '<no subject>', "\n"; } # # Finish # $folder->close;
and tried it against one of the Pegasus mailboxes
D:\temp\MailBox\Mail-Box-2.051\examples>perl open.pl c:\pmail\mbox\FOL +03E97.PMM Mail folder c:\pmail\mbox\FOL03E97.PMM contains 0 messages:
I then created a "Unix" mailbox in Pegasus, ran the same script against that, and apart from a number of 'warning' messages, it worked fine. :)
D:\temp\MailBox\Mail-Box-2.051\examples>perl open.pl c:\pmail\mbox\unx +05e52.mbx WARNING: Illegal character in field name From sydney.dialix.com.au!bou +nced-addr Sat Nov 08 10 Mail folder c:\pmail\mbox\unx05e52.mbx contains 8 messages: 1. Summary of your weekly E-Mail charges from DIALix Sydney 2. Summary of your weekly E-Mail charges from DIALix Sydney 3. Connect debit 4. New Cheque book 5. RE: Deposit clearance 6. Deposit clearance 7. RE: September statement 8. September statement
... so, I have now realised that there is probably no module to parse through the Pegasus email boxes, because the mailboxes are not of a Unix format. I don't intend to convert the mailboxes also, as there are nearly 300 and I can't find a tool to convert them "on mass"
Therefore, this last part of the task/project is to simply read through the 300 'mailboxes' (files), and look for any email addresses. The email addresses could be in the header or body of the files, and many are multi-part and have encoded parts. All that said, it's just:
1. Reading each file, record by record (CR/LF as terminators)
2. Search the record for any email address/s
3. Extract the email address/s, and if it isn't in the array from sub "load_mail_list", then add the details to the array, like what Roger suggested
if (! exists $MailList->{$addr}) { # ok, we haven't seen this Email address yet $MailList->{$addr} = $name; # and do other things }
Could someone show me how to search for either one or multiple email addresses in a line (record) please. It would be nice to also grab the 'name' in addition to the email address, but that might be rather difficult, as there are so many different formats of defining the name with an email address.
Thanks, :)
Peter
In reply to Parsing Pegasus email boxes by peterr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |