in reply to Re^5: extracting name & email address
in thread extracting name & email address

Found some code that strips out the attachments and writes them as files. I tried it on 2 test files, one with 3 parts, another with about 15 parts. In both cases the script wrote out the attachments correctly. Here is that code with a 'print' added ..

use Email::MIME; use Email::MIME::Attachment::Stripper; use File::Slurp qw(slurp write_file); my $infile = '/home/******/Mail/.family.directory/1280915907.6583.I9x +0z:2,S'; my $m = Email::MIME->new( scalar slurp $infile ); my $s = Email::MIME::Attachment::Stripper->new( $m, 'force_filename' = +> 1 ); print $s->message; #displays "Email::MIME=HASH(0xe0fca8)" foreach my $attachment ( $s->attachments ) { write_file( $attachment->{filename}, { buf_ref => \$attachment->{payload} } ) or die "Can't write $attachment->{filename}: $!\n"; }

It displays - Email::MIME=HASH(0xe0fca8) How do I get the contents of part one into a variable so that I can them extract the email addresses please ?

Replies are listed 'Best First'.
Re^7: extracting name & email address
by peterr (Scribe) on Feb 25, 2015 at 02:34 UTC

    Seems the attachment stripper either doesn't work, or I have it coded incorrectly. Am currently running a script without the attachment stripper. It has been running for an hour or so, and those messages appear all the time - "Complex regular subexpression recursion limit (32766) exceeded at /usr/share/perl5/Email/Address.pm line 108." This most likely has something to do with a regex on the attachments. Hence the need to process email files without attachments.

    Was looking through some small scripts I have here that look only for "From:", "To:", "Cc" and "Bcc". That led to using Email::Simple ..

    #!/usr/bin/env perl # use strict; use warnings; use File::Find; use Email::Simple; use File::Slurp qw( read_file ); my $directory = '/home/******/Mail/.family.directory/Browne, Bill & Ma +rtha'; my $outfile = 'output2.txt'; my @found_files; find( sub { push @found_files, $File::Find::name }, $directory ); foreach(@found_files) { my $file = "$_"; if (-f $file) { print $_,"\n"; my $intext = File::Slurp::read_file( $file ); my $mail = Email::Simple->new($intext); my $from_header = $mail->header("From"); my $to_header = $mail->header("To"); my $date_header = $mail->header("Date"); my $cc_header = $mail->header("CC"); my $bcc_header = $mail->header("BCC"); my @emails = ""; push @emails, ($from_header, $to_header); if( length $cc_header ) { push @emails, $cc_header; } if( length $bcc_header ) { push @emails, $bcc_header; } File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @ema +ils ) ); } }

    This took about 2 seconds to process all the 592 emails, and successfully output the names and emails to a file. Just a few observations:

    My use of an array needs improving

    I'm unsure if the $mail->header("CC"); will also read line/s with "Cc" or "cc". The same is true for BCC.

    Where there are a lot of emails, I need to format them so that every "," is replaced so that it becomes a seperate entry in the array. At present it is one large string with email names/address, seperated with a comma. (Will need to be careful where a "," is in the name though). How do I do that ?

      The last solution is not exactly what I need. It misses all the (text or html) attachments that have email address in them. This code is more like what I need ..

      #!/usr/bin/perl # Print the types of messages in the folder. Multi-part messages will # be shown with all their parts. # # This code can be used and modified without restriction. # Mark Overmeer, <********@overmeer.net>, 9 nov 2001 use warnings; use strict; use lib '..', '.'; use Mail::Box::Manager 2.00; use Email::Address; use File::Slurp qw( write_file ); my $outfile = 'multipart4_out.txt'; sub show_type($;$); # # Get the command line arguments. # die "Usage: $0 folderfile\n" unless @ARGV==1; my $filename = shift @ARGV; # # Open the folder # my $mgr = Mail::Box::Manager->new; my $folder = $mgr->open ( $filename , extract => 'LAZY' # never take the body unless needed ); # which saves memory and time. die "Cannot open $filename: $!\n" unless defined $folder; # # List all messages in this folder. # my @messages = $folder->messages; print "Mail folder $filename contains ", scalar @messages, " messages: +\n"; my $counter = 1; foreach my $message (@messages) { printf "%3d. ", $counter++; print $message->get('Subject') || '<no subject>', "\n"; show_type $message; get_header_values($message); } sub get_header_values($) { my $msg = shift; # return all Mail::Message objects my @from = $msg->from; my $sender = $msg->sender; my $subject = $msg->subject; my $msgid = $msg->messageId; my @to = $msg->to; my @cc = $msg->cc; my @bcc = $msg->bcc; my @dest = $msg->destinations; my $replyto = $msg->get('Reply-To'); # need code here to use @from, @to, @cc , @bcc and $replyto, and p +ush out those emails (name/email address) to an array, # then output to $outfile, like below #File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @em +ails) ); } sub show_type($;$) { my $msg = shift; my $indent = (shift || '') . ' '; # increase indentation print $indent, " type=", $msg->get('Content-Type'), ', ' , $msg->size, " bytes\n"; if($msg->isMultipart) { foreach my $part ($msg->parts) { my $content_type = $part->contentType; # the only parts we are interested in are text and html - bypa +ss all others if (defined($content_type) && $content_type eq 'text/plain') or (defined($content_type) && $content_type eq 'text/html') { my @emails = Email::Address->parse($part->body); #this isn' +t working properly #File::Slurp::write_file( $outfile, {append => 1 }, join("\n", + @emails) ); } show_type $part, $indent; } } } # # Finish # $folder->close;

      But there are some errors when running it ..

      "my" variable $part masks earlier declaration in same statement at multipart4.pl line 97. main::get_header_values() called too early to check prototype at multipart4.pl line 57. syntax error at multipart4.pl line 92, near ") or" syntax error at multipart4.pl line 100, near "}"

      I don't think the "OR" is constructed properly. Why is there a msg on the sub get_header_values ?