bobdole has asked for the wisdom of the Perl Monks concerning the following question:

I am not sure if perl is the correct language for this, I am open to other language suggestions if you can think of a better one, maybe php?

My question is I have emails coming to a specified email address on our server with an attached PDF. What I need to do is whenever that email makes it to our server I need to pull off the PDF attachment base64 encode that PDF and send it off through a web service. And this all needs to be done automatically. Is this possible? A point in the right direction would be very appreciated. Thanks.

Replies are listed 'Best First'.
Re: pull off PDF attachment
by brian_d_foy (Abbot) on Jan 27, 2006 at 01:55 UTC

    All of this is possible in Perl (although it's possible in a lot of other languages too).

    There are many email handling modules, the MIME::Parser module can easily extract email attachments (as I show in Re: Save Email Attachment to File), and there are a couple of modules that can do the uuencoding. Just about everything you need you'll find on CPAN Search.

    Depending on the web service, I'm sure there's some module that handles that too.

    Beyond that, I think you'd need to let us in on some details if you want specific help with something.

    Good luck! :)

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
      Awesome. Thanks, I will get started on it and I will check back with you guys if I get stuck.
Re: pull off PDF attachment
by chargrill (Parson) on Jan 27, 2006 at 04:07 UTC

    Here's a script where I've used Mime::Parser to strip attachments off of emails... Now, these are jpg's, sent from my phone, and the script actually creates a thumbnail and deposits the original plus thumbnail in separate directories...

    Keep in mind, this code is not "production" level and therefore not in great shape, but it suffices for my purposes.

    #!/usr/bin/perl -w use strict; use MIME::Parser; use MIME::Base64 qw(decode_base64); use Image::Magick::Thumbnail; my $imagedir = '/www/mainimagedir/photos'; my $archivedir = '/www/mainimagedir/photos/arch'; my $thumbdir = '/www/mainimagedir/photos/thumbs'; my $homepage_image = $imagedir . '/image.jpg'; my $archive_image = $archivedir . '/' . time . '.jpg'; my $homepage_thumb = $thumbdir . '/image_thumb.jpg'; my $archive_thumb = $thumbdir . '/' . time . '_thumb.jpg'; open LOG, ">>/www/mainimagedir/photos/photoblog.log"; print LOG "############## NEW RUN \@" . time . " ################\n"; print LOG "homepage_image: $homepage_image\n"; print LOG "archive_image : $archive_image\n"; print LOG "homepage_thumb: $homepage_thumb\n"; print LOG "archive_thumb : $archive_thumb\n"; my $parser = new MIME::Parser; $parser->output_to_core(1); print LOG "created parser\n"; my $message = $parser->parse(\*STDIN); print LOG "-=-=-=-=-=-=-=-=-=- message -=-=-=-=-=-=-=-=-=-\n$message\n +-=-=-=-=-=-=-=-=-=- end msg -=-=-=-=-=-=-=-=-=-\n"; foreach my $part ($message->parts_DFS) { print LOG "processing part\n"; if ($part->bodyhandle) { if ($part->mime_type eq 'image/jpeg') { print LOG "ok, it's a jpeg image we're dealing wit +h\n"; my $data = $part->as_string; print LOG "===================== image data == +===================\n$data\n===================== end image ======== +=============\n"; my @raw_part = split(/\n/, $data); my @edited_part; foreach my $line (@raw_part) { next if ($line =~ /^Content/); next if ($line =~ /^\s+$/); next if ($line =~ /^$/); next if ($line =~ /name/); next if ($line =~ /mode/); next if ($line =~ /x\-mac/); push @edited_part, $line; } $data = join("\n", @edited_part); print LOG "===================== image data == +===================\n$data\n===================== end image ======== +=============\n"; $data = decode_base64($data); open HOME, "> $homepage_image" or die "Couldn' +t open $homepage_image for write: $!\n"; print HOME $data; close HOME; print LOG "wrote homepage image\n"; open ARCHIVE, "> $archive_image" or die "Could +n't open $archive_image for write: $!\n"; print ARCHIVE $data; close ARCHIVE; print LOG "wrote archive image\n"; my $src = new Image::Magick; print LOG "created ImageMagick object\n"; $src->Read($homepage_image); print LOG "read homepage_image\n"; my ($thumb,$x,$y) = Image::Magick::Thumbnail:: +create($src,64); print LOG "created thumbnail\n"; $thumb->Write("$homepage_thumb"); $thumb->Write("$archive_thumb"); print LOG "wrote thumbnails\n"; } print LOG "processed image part\n"; } print LOG "tail end of some loop\n"; } close LOG;

    The biggest thing I had problems with was the parsing out some of the mime headers, hence the ton 'o' logging, which is normally commented out during heavy use.



    --chargrill
    $/ = q#(\w)# ; sub sig { print scalar reverse join ' ', @_ } + sig map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu +" );
Re: pull off PDF attachment
by glasswalk3r (Friar) on Jan 27, 2006 at 11:43 UTC

    Depending on the size of the PDF files that you're going to retrieve, using a webservice may not be the best idea: I supposed it will be quite slow to transfer. If you just need to transfer the PDF it would be better to compress it before and use HTTP, FTP or whatever to transfer it.

    Alceu Rodrigues de Freitas Junior
    ---------------------------------
    "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill
Re: pull off PDF attachment
by bobdole (Beadle) on Jan 27, 2006 at 19:14 UTC
    I actually came across some really good base code that I modified to do what I need. Where I am stuck now is I am trying to move a message after I have removed the attachment to a different folder so I am not constantly looking at the same emails. Here is what I have so far:

    #!/usr/bin/perl -w use MIME::Base64; $|++; my $VERSION = "1.0"; use Getopt::Long; my %opts; # make sure we have the modules we need, else die peacefully. eval("use Net::POP3;"); die "[err] Net::POP3 not installed.\n" if $@; eval("use MIME::Parser;"); die "[err] MIME::Parser not installed.\n" i +f $@; # define our command line flags (long and short versions). GetOptions(\%opts, 'server|s=s', # the POP3 server to use. 'username|u=s', # the POP3 username to use. 'password|p=s', # the POP3 password to use. 'begin|b=i', # what msg number to start at. ); $opts{server} = "localhost"; $opts{username} = "username"; $opts{password} = "password"; # at the very least, we need our login information. die "[err] POP3 server missing, use --server or -s.\n" unless $opts{se +rver}; die "[err] Username missing, use --username or -u.\n" unless $opts{use +rname}; die "[err] Password missing, use --password or -p.\n" unless $opts{pas +sword}; # try an initial connection to the server. print "-" x 76, "\n"; # merely a visual seperator. my $conn = Net::POP3->new( $opts{server} ) or die "[err] There was a problem connecting to the server.\n"; print "Connecting to POP3 server at $opts{server}.\n"; # and now the login information. $conn->login( $opts{username}, $opts{password} ) or die "[err] There was a problem logging in (.poplock? credentials? +).\n"; print "Connected successfully as $opts{username}.\n"; # purdy stats about our mailbox. my ($msg_total, $mbox_size) = $conn->popstat( ); if ($msg_total eq 0) { print "No new emails are available.\n"; exit; +} if ($msg_total eq '0E0') { print "No new emails are available.\n"; ex +it; } print "You have $msg_total messages totalling ", commify($mbox_size), +"k.\n"; # the list of valid file extensions. we do extensions, not # mime-types, because they're easier to understand from # an end-user perspective (no research is required). my $valid_exts = "pdf"; my %msg_ids; # used to keep track of seen emails. my $msg_num = $opts{begin} || 1; # user specified or 1. # create a subdirectory based on today's date. my ($d,$m,$y) = (localtime)[3,4,5]; $y += 1900; $m++; $d = sprintf "%02.0d", $d; $m = sprintf "%02.0d", $m; print "Using directory '$y-$m-$d' for newly downloaded files.\n"; my $savedir = "$y-$m-$d"; mkdir($savedir, 0777); # begin looping through each msg. print "-" x 76, "\n"; # merely a visual seperator. while ($msg_num <= $msg_total) { # the size of the individual email. my $msg_size = $conn->list($msg_num); # get the header of the message # so we can check for duplicates. my $headers = $conn->top($msg_num); # print/store the good bits. my ($msg_subj, $msg_id); foreach my $header (@$headers) { # print subject line and size. if ($header =~ /^Subject: (.*)/) { $msg_subj = substr($1, 0, 50); # trim subject down a bit. print "Msg $msg_num / ",commify($msg_size),"k / $msg_subj. +..\n"; } if ($header =~ /^Date: (.*)/) { my $msg_date = substr($1, 0, 50); # trim subject down a bi +t. print "Date $msg_date\n"; } # save Message-ID for duplicate comparison. elsif ($header =~ /^Message-ID: <(.*)>/i) { $msg_id = $1; $msg_ids{$msg_id}++; } # move on to the filtering. elsif ($msg_subj and $msg_id) { last; } } # if the message size is too small, then it # could be a reply or something of low quality. if (defined($msg_size) and $msg_size < 40) { print " Skipping - message size is smaller than our threshold +.\n"; $msg_num++; next; } # check for matching Message-ID. If found, # skip this message. This will help eliminate # crossposting and duplicate downloads. if (defined($msg_id) and $msg_ids{$msg_id} >= 2) { print " Skipping - we've already seen this Message-ID.\n"; $msg_num++; next; } # get the message to feed to MIME::Parser. my $msg = $conn->get($msg_num); # create a MIME::Parser object to # extract any attachments found within. my $parser = new MIME::Parser; $parser->output_dir( $savedir ); #my $enmsg = encode_base64($msg); my $entity = $parser->parse_data($msg); # extract our mime parts and go through each one. my @parts = $entity->parts; foreach my $part (@parts) { # determine the path to the file in question. my $path = ($part->bodyhandle) ? $part->bodyhandle->path : und +ef; # move on if it's not defined, # else figure out the extension. next unless $path; $path =~ /\w+\.([^.]+)$/; my $ext = $1; next unless $ext; # we continue only if our extension is correct. my $continue; $continue++ if $valid_exts =~ /$ext/i; # delete the blasted thing. unless ($valid_exts =~ /$ext/) { print " Removingg unwanted filetype ($ext): $path\n"; unlink $path or print " > Error removing file at $path: $!. +"; next; # move on to the next attachment or message. } # a valid file type. yummy! print " Keeping valid file: $path.\n"; my $encodedfile = base64_encode_file($path); #print "$encodedfile"; } # increase our counter. $msg_num++; } # clean up and close the connection. $conn->quit; # now, jump into our savedir and remove all msg-* # files, which are message bodies saved by MIME::Parser. chdir ($savedir); opendir(SAVE, "./") or die $!; my @dir_files = grep !/^\.\.?$/, readdir(SAVE); closedir(SAVE); foreach (@dir_files) { unlink if $_ =~ /^msg-/; } # cookbook 2.17. sub commify { my $text = reverse $_[0]; $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g; return scalar reverse $text; } sub base64_encode_file { # Initialize variables local($file) = $_[0]; local($encoded, $line) = ''; local($len, $bytes, $pad) = 0; # Open the file if (open (DATA, "<$file")) { # Process the data while ($bytes = read(DATA, $line, 45)) { $len += $bytes; # uuencode the line and remove the first and last characte +rs $encoded .= substr(pack('u', $line), 1); chop($encoded); } # Convert from uuencoded to base64 $encoded =~ tr| -_`|A-Za-z0-9+/A|; $pad = (3 - ($len % 3)) % 3; substr($encoded, -$pad, $pad) = '=' x $pad; $encoded =~ s/(.{76})/$1\n/g; } else { $Error_Message = "The file \"$file\" could not be opened ($!). +"; } # Return the result (null if the file couldn't be opened) return($encoded); }
    As long as the code is running it keeps track of seen emails but every time you start the code over it looks at the same email messages.