Hi All,

I am having a slight problem with a script I have been working on to parse my emails on a Windows 7 machine.

The script is composed mostly of bits and pieces of other scripts I have found over the last few months. Originally, I was trying to have it accomplish two tasks - 1) Extract attachments to a specific directory as long as they are from a certain address, and 2) Extract all plaintext messages if they are also from specific addresses.

#!/usr/local/bin/perl -w use MIME::Parser; use IO::File; use Email::MIME::Attachment::Stripper; use File::Slurp qw(write_file); my $output_path = 'C:\\phone\\'; my $word_to_delete = "deleteme"; my $movedir = 'completed'; use Cwd qw(); @files = <*.eml>; foreach $file (@files) { my $parser = MIME::Parser->new(); $parser->output_dir($output_path); $parser->output_prefix('deleteme'); $parser->output_to_core(0); $parser->extract_nested_messages(1); $parser->parse_nested_messages(1); open(INPUT, "$file") or die("Input error: $!"); local $/ = undef; my $textstuff = <INPUT>; my $entity = $parser->parse_open($file) or die ("Input error: $!"); close (INPUT); my $pathx = Cwd::cwd(); my $header = $entity->head; $subject = $header->get('Subject'); $to = $header->get('To'); $from = $header->get('From'); $date = $header->get('Date'); $mime = $entity->mime_type; $effective = $entity->effective_type; print "[File]: $file...\n"; if ($from =~ "address" || $from =~ "address2" || $from =~ "address3" | +| $from =~ "address4" || $from =~ "address5" || $from =~ "address6" | +| $from =~ "address7") { if ($to =~ "address" || $to =~ "address2" || $to =~ "address3" || $to +=~ "address4" || $to =~ "address5" || $to =~ "address6" || $to =~ "ad +dress7") { if ($textstuff =~ /sent from my ipod/ig || $textstuff =~ /sent from my + iphone/ig) { print "Subject: $subject\nFrom: $from\nTo: $to\nDate: $date\nContent T +ype: content\nMime Type: $mime\nEffective Type: $effective\n"; open PART2, $file; while (<PART2>) { open (LOG, ">>C:\\phone\\MyStuff.txt"); $_ =~ s/Sent from my iPod//ig; $_ =~ s/Sent from my iPhone//ig; if ($entity->parts > 0) { print ">> Multi Part Message\n"; for (my $i=0;$i<$entity->parts;$i++) { my $subEntity = $entity->parts($i); print ">> MIME: ", $subEntity->mime_type," \n"; if ($subEntity->mime_type eq 'text/plain') { if (my $io = $subEntity->open("r")) { while (defined($_=$io->getline)) { $body2 = $_; $body2 =~ s/Sent from my iPod//ig; $body2 =~ s/Sent from my iPhone//ig; $body2 =~ s/\n//ig; $subject =~ s/\n//ig; print LOG "<a href=\"$body2\">$subject</a><p>\n\n"; } } } } } else { my $body = join "", @{$entity->body}; print ">> Not a multipart\n"; print $body; $body =~ s/Sent from my iPod//ig; $body =~ s/Sent from my iPhone//ig; $body =~ s/\n//ig; $subject =~ s/\n//ig; print LOG "<a href=\"$body\">$subject</a><p>\n\n"; close (LOG); }} close PART2; }} my $m = Email::MIME::Attachment::Stripper->new($textstuff, \'force_fil +ename' => 1); my $msg = $m->message; my @attachments = $m->attachments; foreach my $a(@attachments) { if ($a->{filename} ne "") { my $img = new IO::File "C:/phone/" . $a->{filename}, "w" or die "Can not create file! $!"; binmode $img; print $img $a->{payload}; print " -- Extracting -- : " . $a->{filename} . "..\n"; $entity->dump_skeleton; } } }if (!-d $movedir) { mkdir $movedir or die "Couldn't create $movedir - $!\n"; } if (-e $movedir) { print "Moving $pathx/$file to $pathx/$movedir/$file\n\n"; rename $pathx . "/" . $file, $pathx . "/" . $movedir . "/" . $file or +die "Couldn't move $file to $movedir - $!\n"; } unlink (glob("$output_path/*$word_to_delete*")) or warn "can't delete +files: $!"; $entity->dump_skeleton; }

Extracting attachments works in most cases (I still end up with files named like '1ddfdfdf' with no file type at times), and usually the text extraction will work as long as it has the ipod text above. The script parses to see if it was sent by my ipod touch or iphone, and in turn grabs links that I have sent myself to view later.

The problem comes when the file is a link from, as an example, address3 above. Address3 would be my old cell phone (not an iphone or my old ipod touch), and the email message will contain one part, which is text/plain. Since the 'sent from ipod' never matches, it never logs it. I have tried to edit the pattern matching part to start logging by making it into this:

if ($textstuff =~ /sent from my ipod/ig || $textstuff =~ /sent from my + iphone/ig || $from =~ /address3/ig) {

As well as

if ($textstuff =~ /sent from my ipod/ig || $textstuff =~ /sent from my + iphone/ig || $_ =~ /address3/ig) {

Unfortunately, it never seems to trigger the log opening up and adding the links it finds to the log. Any ideas on what would make this work better, and actually log the urls? Also, does anyone have any pointers or code changing recommendations on how to make it log any messages that match any of the from addresses, IF they don't have an attachment? The way I had it written previously, it would log all files, including the binary code, to a text file. It started to add up when I had over 300 attachments.

For anyone wondering, I am basically trying to build a bot of sorts to go through and extract old links, and I plan to use lwp simple or the Firefox mechanize module to crawl those websites and save a local copy for me.

Thanks all for your help!


In reply to Perl MIME parser partially works by CalebH

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.