There is a (Kmail) folder that I need to parse through and extract all the names and emails. Basically to do this

1.  Put a recursive folder list of all the files into an array
2.  Go through the array and open each file
3.  For each file, if the name/email address is not found, add the name/email address to an array
4.  Write the contents of the array to a file

I have step 1 basically done with the following script..

use strict; use warnings; use File::Find; find(\&wanted, "."); sub wanted { return if -d; print "$File::Find::name", "\n"; }

Instead of a print, add the path/filename to an array. I need to filter out all *.pl , *.php, .directory , ".." and "."  (It may be easier to only do a recursive find on "/cur" and "/new" paths, excluding files ".directory" )

For parts 2 and 3, I do have some scripts from years back that went through Pegasus email folders. From memory, each Pegasus email folder has many email messages within in. For this task now, KMail has one file for each email. The email address may not nessarily be in the email headers, so need to also expect the names/email address in the body part of each file. Here is one of those older scripts; possibly it can be modified to suit ..

#C:\Perl\bin\Perl.exe -w use strict; use IO::File; use Data::Dumper; use Mail::MboxParser; BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>parsembox-log") or die("Unable to open parsembox-log: $!\n"); carpout(LOG); } use Mail::Box; use Mail::Box::Manager; use Mail::Message; my $mb = Mail::MboxParser->new('/home/*********/Mail/.family.directory +/Browne, Bill & Martha/FOL03E97.PMM', decode => 'ALL'); # ----------- # slurping for my $msg ($mb->get_messages) { print $msg->header->{subject}, "\n"; $msg->store_all_attachments('/tmp'); } # iterating while (my $msg = $mb->next_message) { print $msg->header->{subject}, "\n"; # ... } my $data ="To: mickey@somewhere.com"; my $msg = Mail::Message->read($data); my @addr = $msg->get('To')->addresses; exit(0); sub parse_mail_folder { print "into parse_mail_folder sub","\n"; my $folder_file = shift; print "Folder file: ",$folder_file,"\n"; my $mgr = Mail::Box::Manager->new(); print "my $mgr value: ",$mgr,"\n"; my $folder = $mgr->open($folder_file) or die "Cannot open Folder", +"\n"; print "my folder value: ",$folder,"\n"; #my $message1 = $folder->message; #print "message: ",$message1,"\n"; my @email_addr; foreach my $message ($folder->messages) { print $message->get('Subject') || '<no subject>', "\n"; print "into foreach loop","\n"; my $dest = $message->get('To'); # retrieve the To-address print $dest,"\n"; @email_addr = split /,/, $dest; # retrieve multiple addresses # assume the email address format is as follows - # # John & Jenny Arnold <johnarnold@somedomain.com> # # you have to tweak a bit if the format is not as expected # or use the Mail::Address module to do the trick - to # convert the mail address into its canonical form. foreach (@email_addr) { my ($name, $addr) = /(.*)<(.*)>/; $name = s/^\s+//g; # trim spaces at front $name = s/\s+$//g; # trim spaces at rear $addr = s/^\s+//g; # trim spaces at front $addr = s/\s+$//g; # trim spaces at rear #print Dumper($addr); print $addr,"\n"; if (! exists $MailList->{$addr}) { # ok, we haven't seen this Email address yet $MailList->{$addr} = $name; # and do other things print Dumper($name); } } } $folder->close; } sub load_mail_list { my $filename = shift; my $f = new IO::File $filename, "r" or die "Can not open mail list +"; my %mlist; # load the header chomp($mlist{title} = <$f>); chomp($mlist{sender} = <$f>); chomp($mlist{nosig} = <$f>); <$f>; # load the rest of the email addresses my %MailAddress; while (<$f>) { chomp; my ($name, $email) = /^(.*)\s+<(.*)>$/; next if $email eq ''; $MailAddress{$email} = $name; } $mlist{mlist} = \%MailAddress; return \%mlist; } sub load_mail_folders { my $filename = shift; my $f = new IO::File $filename, "r" or die "Can not open mail list +"; my %mbox; while (<$f>) { chomp; next unless ( $_ ne '' and m/^0,0,/ ); s/"//g; my @fld = split /,/; my ($folder) = $fld[2] =~ /.*:.*:(.*)/; $mbox{$fld[-1]} = "/home/*********/Mail/.family.directory/Brow +ne, Bill & Martha/$folder.PMM"; # full path to mboxes } return \%mbox; }

In reply to extracting name & email address by peterr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.