Faile has asked for the wisdom of the Perl Monks concerning the following question:

1st question:
Is there a library out there to read SMTP messages (written to disk in text form with attachments still encoded in various ways) and reliably extract the names of any attachments from them (regardless of sending client)?

Question answered by Alexander

2nd question:

Situation: I have an application that writes messages in raw format to disk ( it's a security backup feature incase an attachment gets lost somewhere ). I want to verify that all the attachments listed in the raw MIME messages are available in an archive directory. Due to application processing the output files have prepended filenames (timestamps) so they are not 1:1

Solution design:
- read all MIMEs and store expected filenames in an array
- index all the files in the archive and store those in an array
- look for each filename in the archive and warn if none is found

Parameters:
- There are several 10s of thousands of files
- Verifying is done sometimes by hand, not frequently

In this snippet i've replaced the input logic with two arrays with sample data:

#!/usr/bin/perl -w use warnings; use strict; my @attfiles = ( 'foo.txt', 'faa.xml', 'fii.pdf' ); my @arcfiles = ( 'x:\archive\1234567890123_foo.txt', 'x:\archive\1234567890123_fuu.xml', 'x:\archive\1234567890123_fii.pdf' ); foreach my $att (@attfiles) { my $found = 0; foreach my $arc (@arcfiles) { my $result = index($arc, $att); if ($result >= 0) { print "Found $att in $arc\n"; $found = 1; last; } } unless ($found) { print "WARNING: Could not find $att\n"; } }

My second idea was to replace the substr() with a simple match regexp because what interest me is "is it there?", not "where is it?".

#!/usr/bin/perl -w use warnings; use strict; my @attfiles = ( 'foo.txt', 'faa.xml', 'fii.pdf' ); my @arcfiles = ( 'x:\archive\1234567890123_foo.txt', 'x:\archive\1234567890123_fuu.xml', 'x:\archive\1234567890123_fii.pdf' ); foreach my $att (@attfiles) { my $found = 0; foreach my $arc (@arcfiles) { if ( $arc =~ m/$att/ ) { print "Found $att in $arc\n"; $found = 1; last; } } unless ($found) { print "WARNING: Could not find $att\n"; } }

There's most likely TMTOWTDI and I was wondering if this is something someone has already solved very neatly, perhaps in a different way?

Thanks for all your help in advance, I am but a humble padawan. :)

Replies are listed 'Best First'.
Re: Attachments in emails and finding matches and cross referencing
by afoken (Chancellor) on Nov 25, 2010 at 13:05 UTC

    Question 1: There are a lot of Mail and MIME handling modules on CPAN, typically they have mail or mime in their name. Just try to search. I use MIME::Parser in one of my scripts to handle all kinds of MIME attachments.

    Question 2: Not really an answer, but are you trying to re-invent maildir, maildir++, or IMAPdir? If not, consider using them. One e-mail == one file. No need to mess with attachments in separate files.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Alexander,

      1: Thanks for your response. I found MIME::Parser a little after writing that first question, thanks for the pointer :)

      2: No, I'm not trying to handle the mail at all, I'm just interested in the attachments. I want to verify from a 2nd location that all the attachments have gotten extracted from mail and saved to disk by an external application that I cannot see.