1st question:
Is there a library out there to read SMTP messages (written to disk in text form with attachments still encoded in various ways) and reliably extract the names of any attachments from them (regardless of sending client)?

Question answered by Alexander

2nd question:

Situation: I have an application that writes messages in raw format to disk ( it's a security backup feature incase an attachment gets lost somewhere ). I want to verify that all the attachments listed in the raw MIME messages are available in an archive directory. Due to application processing the output files have prepended filenames (timestamps) so they are not 1:1

Solution design:
- read all MIMEs and store expected filenames in an array
- index all the files in the archive and store those in an array
- look for each filename in the archive and warn if none is found

Parameters:
- There are several 10s of thousands of files
- Verifying is done sometimes by hand, not frequently

In this snippet i've replaced the input logic with two arrays with sample data:

#!/usr/bin/perl -w use warnings; use strict; my @attfiles = ( 'foo.txt', 'faa.xml', 'fii.pdf' ); my @arcfiles = ( 'x:\archive\1234567890123_foo.txt', 'x:\archive\1234567890123_fuu.xml', 'x:\archive\1234567890123_fii.pdf' ); foreach my $att (@attfiles) { my $found = 0; foreach my $arc (@arcfiles) { my $result = index($arc, $att); if ($result >= 0) { print "Found $att in $arc\n"; $found = 1; last; } } unless ($found) { print "WARNING: Could not find $att\n"; } }

My second idea was to replace the substr() with a simple match regexp because what interest me is "is it there?", not "where is it?".

#!/usr/bin/perl -w use warnings; use strict; my @attfiles = ( 'foo.txt', 'faa.xml', 'fii.pdf' ); my @arcfiles = ( 'x:\archive\1234567890123_foo.txt', 'x:\archive\1234567890123_fuu.xml', 'x:\archive\1234567890123_fii.pdf' ); foreach my $att (@attfiles) { my $found = 0; foreach my $arc (@arcfiles) { if ( $arc =~ m/$att/ ) { print "Found $att in $arc\n"; $found = 1; last; } } unless ($found) { print "WARNING: Could not find $att\n"; } }

There's most likely TMTOWTDI and I was wondering if this is something someone has already solved very neatly, perhaps in a different way?

Thanks for all your help in advance, I am but a humble padawan. :)


In reply to Attachments in emails and finding matches and cross referencing by Faile

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.