comment on

Hi,

In Parsing email files, what are the best modules ? , I stated my objectives for a small project. Essentially I need to make sure my email distribution lists are up to date. I'm using Pegasus mail as my email client, and the perl I'm using is ActiveState 5.8.0.806 on a Win box.

Thanks to the code from Roger , in reply to the question, two out of the three objectives are working just fine. :)

The third part, addressed in

sub parse_mail_folder

basically won't work because the 'mailboxes' are not of a Unix format. I couldn't work out why the code wouldn't drop into the 'foreach' loop (in _that_ sub), because it was doing steps 1 and 2 perfectly, and setting up the path/name to the mailboxes okay. Then having a look at one of the examples in the "MailBox" distribution

#!/usr/bin/perl

# Demonstration on how to use the manager to open folders, and then
# to print the headers of each message.
#
# This code can be used and modified without restriction.
# Mark Overmeer, <mailbox@overmeer.net>, 9 nov 2001

use warnings;
use strict;
use lib '..', '.';

use Mail::Box::Manager 2.00;

#
# Get the command line arguments.
#

die "Usage: $0 folderfile\n"
    unless @ARGV==1;

my $filename = shift @ARGV;

#
# Open the folder
#

my $mgr    = Mail::Box::Manager->new;

my $folder = $mgr->open
   ( $filename
   , extract => 'LAZY'   # never take the body unless needed
   );                    #  which saves memory and time.

die "Cannot open $filename: $!\n"
    unless defined $folder;

#
# List all messages in this folder.
#

my @messages = $folder->messages;
print "Mail folder $filename contains ", scalar @messages, " messages:
+\n";

my $counter  = 1;
foreach my $message (@messages)
{   printf "%3d. ", $counter++;
    print $message->get('Subject') || '<no subject>', "\n";
}

#
# Finish
#

$folder->close;
[download]

and tried it against one of the Pegasus mailboxes

D:\temp\MailBox\Mail-Box-2.051\examples>perl open.pl c:\pmail\mbox\FOL
+03E97.PMM

Mail folder c:\pmail\mbox\FOL03E97.PMM contains 0 messages:
[download]

I then created a "Unix" mailbox in Pegasus, ran the same script against that, and apart from a number of 'warning' messages, it worked fine. :)

D:\temp\MailBox\Mail-Box-2.051\examples>perl open.pl c:\pmail\mbox\unx
+05e52.mbx

WARNING: Illegal character in field name From sydney.dialix.com.au!bou
+nced-addr
Sat Nov 08 10
Mail folder c:\pmail\mbox\unx05e52.mbx contains 8 messages:
  1. Summary of your weekly E-Mail charges from DIALix Sydney
  2. Summary of your weekly E-Mail charges from DIALix Sydney
  3. Connect debit
  4. New Cheque book
  5. RE: Deposit clearance
  6. Deposit clearance
  7. RE: September statement
  8. September statement
[download]

... so, I have now realised that there is probably no module to parse through the Pegasus email boxes, because the mailboxes are not of a Unix format. I don't intend to convert the mailboxes also, as there are nearly 300 and I can't find a tool to convert them "on mass"

Therefore, this last part of the task/project is to simply read through the 300 'mailboxes' (files), and look for any email addresses. The email addresses could be in the header or body of the files, and many are multi-part and have encoded parts. All that said, it's just:

1. Reading each file, record by record (CR/LF as terminators)
2. Search the record for any email address/s
3. Extract the email address/s, and if it isn't in the array from sub "load_mail_list", then add the details to the array, like what Roger suggested

if (! exists $MailList->{$addr}) {
    # ok, we haven't seen this Email address yet
    $MailList->{$addr} = $name;

    # and do other things
    }
[download]

Could someone show me how to search for either one or multiple email addresses in a line (record) please. It would be nice to also grab the 'name' in addition to the email address, but that might be rather difficult, as there are so many different formats of defining the name with an email address.

Thanks, :)

Peter

In reply to Parsing Pegasus email boxes by peterr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.