peterr has asked for the wisdom of the Perl Monks concerning the following question:

There is a (Kmail) folder that I need to parse through and extract all the names and emails. Basically to do this

1.  Put a recursive folder list of all the files into an array
2.  Go through the array and open each file
3.  For each file, if the name/email address is not found, add the name/email address to an array
4.  Write the contents of the array to a file

I have step 1 basically done with the following script..

use strict; use warnings; use File::Find; find(\&wanted, "."); sub wanted { return if -d; print "$File::Find::name", "\n"; }

Instead of a print, add the path/filename to an array. I need to filter out all *.pl , *.php, .directory , ".." and "."  (It may be easier to only do a recursive find on "/cur" and "/new" paths, excluding files ".directory" )

For parts 2 and 3, I do have some scripts from years back that went through Pegasus email folders. From memory, each Pegasus email folder has many email messages within in. For this task now, KMail has one file for each email. The email address may not nessarily be in the email headers, so need to also expect the names/email address in the body part of each file. Here is one of those older scripts; possibly it can be modified to suit ..

#C:\Perl\bin\Perl.exe -w use strict; use IO::File; use Data::Dumper; use Mail::MboxParser; BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>parsembox-log") or die("Unable to open parsembox-log: $!\n"); carpout(LOG); } use Mail::Box; use Mail::Box::Manager; use Mail::Message; my $mb = Mail::MboxParser->new('/home/*********/Mail/.family.directory +/Browne, Bill & Martha/FOL03E97.PMM', decode => 'ALL'); # ----------- # slurping for my $msg ($mb->get_messages) { print $msg->header->{subject}, "\n"; $msg->store_all_attachments('/tmp'); } # iterating while (my $msg = $mb->next_message) { print $msg->header->{subject}, "\n"; # ... } my $data ="To: mickey@somewhere.com"; my $msg = Mail::Message->read($data); my @addr = $msg->get('To')->addresses; exit(0); sub parse_mail_folder { print "into parse_mail_folder sub","\n"; my $folder_file = shift; print "Folder file: ",$folder_file,"\n"; my $mgr = Mail::Box::Manager->new(); print "my $mgr value: ",$mgr,"\n"; my $folder = $mgr->open($folder_file) or die "Cannot open Folder", +"\n"; print "my folder value: ",$folder,"\n"; #my $message1 = $folder->message; #print "message: ",$message1,"\n"; my @email_addr; foreach my $message ($folder->messages) { print $message->get('Subject') || '<no subject>', "\n"; print "into foreach loop","\n"; my $dest = $message->get('To'); # retrieve the To-address print $dest,"\n"; @email_addr = split /,/, $dest; # retrieve multiple addresses # assume the email address format is as follows - # # John & Jenny Arnold <johnarnold@somedomain.com> # # you have to tweak a bit if the format is not as expected # or use the Mail::Address module to do the trick - to # convert the mail address into its canonical form. foreach (@email_addr) { my ($name, $addr) = /(.*)<(.*)>/; $name = s/^\s+//g; # trim spaces at front $name = s/\s+$//g; # trim spaces at rear $addr = s/^\s+//g; # trim spaces at front $addr = s/\s+$//g; # trim spaces at rear #print Dumper($addr); print $addr,"\n"; if (! exists $MailList->{$addr}) { # ok, we haven't seen this Email address yet $MailList->{$addr} = $name; # and do other things print Dumper($name); } } } $folder->close; } sub load_mail_list { my $filename = shift; my $f = new IO::File $filename, "r" or die "Can not open mail list +"; my %mlist; # load the header chomp($mlist{title} = <$f>); chomp($mlist{sender} = <$f>); chomp($mlist{nosig} = <$f>); <$f>; # load the rest of the email addresses my %MailAddress; while (<$f>) { chomp; my ($name, $email) = /^(.*)\s+<(.*)>$/; next if $email eq ''; $MailAddress{$email} = $name; } $mlist{mlist} = \%MailAddress; return \%mlist; } sub load_mail_folders { my $filename = shift; my $f = new IO::File $filename, "r" or die "Can not open mail list +"; my %mbox; while (<$f>) { chomp; next unless ( $_ ne '' and m/^0,0,/ ); s/"//g; my @fld = split /,/; my ($folder) = $fld[2] =~ /.*:.*:(.*)/; $mbox{$fld[-1]} = "/home/*********/Mail/.family.directory/Brow +ne, Bill & Martha/$folder.PMM"; # full path to mboxes } return \%mbox; }

Replies are listed 'Best First'.
Re: extracting name & email address
by karlgoethebier (Abbot) on Feb 23, 2015 at 10:18 UTC
      KMail is a KDE email client. Some lines later you talk about Pegasus, a Windows email client. And why do you need to filter out "*.pl" or "*.php" in an mbox folder?

      I'm using KMail folders/files. Pegasus was an example from years ago. Perl & php files are in the same path. I don't really need to bypass them though, as they won't contain email addresses.

      And what does this mean: "For each file, if the name/email address is not found, add the name/email address to an array"

      Simply to load a unique array , based on name/email address. That isn't important though, as many emails will appear more than once. I can filter out the duplicates later

        The following code works to a point. It is writing name and email address out to a file. The only real problem is the msg "Complex regular subexpression recursion limit (32766) exceeded at /usr/share/perl5/Email/Address.pm line 108."

        #!/usr/bin/env perl # use strict; use warnings; use File::Find; use File::Slurp qw( read_file ); use Email::Address; my $directory = '/home/*****/Mail/.family.directory/Browne, Bill & Ma+ +rtha';'; my $outfile = 'output.txt'; my @found_files; find( sub { push @found_files, $File::Find::name }, $directory ); foreach(@found_files){ my $file = "$_"; if (-f $file) { print $_,"\n"; my $intext = File::Slurp::read_file( $file ); my @emails = Email::Address->parse( $intext ); File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @ema +ils) ); } }

        The file that the warning msg appears has a large attachment. So, somehow need to bypass any attachments in the slurp ?

Re: extracting name & email address
by GrandFather (Saint) on Feb 23, 2015 at 03:39 UTC

    Is there a question?

    Perl is the programming world's equivalent of English
      The code below has a few errors. All I want to be able to do at first, is to load an array with all the filenames after a recursive path search.
      use strict; use IO::File; my $directory = '/home/******/Mail/.family.directory/Browne, Bill & Ma +rtha'; opendir(DIR,$directory); my @files = readdir(DIR); closedir(DIR); foreach(@files){ $f = new IO::File::open; while (<$f>) { //.. something here... } print $_,"\n"; }
      Is this how to approach this ?

        For finding all files recursively, just use File::Find.

        use File::Find; my @found_files; find( sub { push @found_files, $File::Find::name }, $directory );

        Is this how to approach this ?

        Maybe

        Consider

        Main( @ARGV ); exit( 0 ); sub Main { my @files = RecursivePathSearch( $path ); for my $file ( @files ){ SomethingHere( $file ); } } sub RecursivePathSearch { my( $path ) = @_; use File::Find::Rule qw/ find rule/ return rule( file => not_name => [ '*.pl', ], )->in( $path ); } sub SomethingHere { my( $file ) = @_; use Path::Tiny qw/ path /; use Email::Address; my $stuff = path( $file )->slurp_raw; return Email::Address->parse( $stuff ); }
      Yes, my question would be - "How do I do the other 3 steps please" ?

        Yes, my question would be - "How do I do the other 3 steps please" ?

        Hmm :) 1) write some code or 2) hire a programmer

        Which would you like to try?