le has asked for the wisdom of the Perl Monks concerning the following question:

Well, my problem is:

I need to parse incoming emails and store relevant parts (header, subject, from a.s.o.) into a database. Not a big deal for plain ASCII emails, but there also can be emails with attachments, and here's where problems start. I already went through 'perldoc MIME::Tools' and co but I found it pretty confusing: I don't know when there's MIME::Parser working or MIME::Entity or whatever Mail Module. I already have setup a parser, that can handle most emails, but some are refused and I don't now why.

I wonder if some monk here could give me a hint.

Should I post the parser code here? (It's rather long, so I don't want to post it now.)
  • Comment on How to parse emails with and without attachments?

Replies are listed 'Best First'.
Re: How to parse emails with and without attachments?
by Punto (Scribe) on Jun 06, 2000 at 19:03 UTC
    Should I post the parser code here? (It's rather long, so I don't want to post it now.)

    I'm writting the exact same thing (for a "free webmail" program), I used this MIME Attachment Extractor code I found on the Snippets section, but I'd be interested on looking at your code (the snippet is perfect, but it wasn't _exactly_ what I needed).

      Here's the code:
      #!/usr/bin/perl -w use MIME::Parser; use DBI; use strict; my $database = "xxxxxxx"; my $dbuser = "xxxxxxx"; my $dbpasswd = "xxxxxxx"; my $outputdir = "/some/directory"; my $parser = new MIME::Parser; $parser->output_dir($outputdir); $parser->output_prefix("attachment"); $parser->output_to_core(); my $entity = $parser->read(\*STDIN); my $msg_header = $entity->stringify_header; my $num_parts = $entity->parts; my $msg_body; if ($num_parts > 0) { for (my $i = 0; $i < $num_parts; $i++) { my $part = $entity->parts($i); my $type = $part->mime_type; my $bh = $part->bodyhandle; if ($type =~ m/text/) { if (my $io = $part->open("r")) { while (defined($_ = $io->getline)) { $msg_body .= $_; } $msg_body .= "\n\n"; $io->close; } my $status = system("rm '$bh->{MB_Path}'"); die $! unless $status == 0; } else { my $file = $bh->{MB_Path}; my $oldfile = $file; $file =~ s/$outputdir\///; my $now = time(); $file = $now . "-" . $file; my $newfile = $outputdir . "/" . $file; my $status = system("mv '$oldfile' '$newfile'"); die $! unless $status == 0; $file = "some/directory" . $file; $file = "http://url/" . $file; $msg_body .= "ATTACHMENT: $file\n"; } } } else { if (my $io = $entity->open("r")) { while (defined($_ = $io->getline)) { $msg_body .= $_; } $msg_body .= "\n"; $io->close; } $entity->purge; } ### The rest is just database processing ...
      The problem is that the parser dies on some mails when it should remove those temporary files.

      (I know this is dirty code, but I made it most out of trial-and-error.)
Re: How to parse emails with and without attachments?
by Maqs (Deacon) on Jun 07, 2000 at 16:39 UTC
    If you are using *nix OS, why don't you try metamail utility to pass email through before running your parser? This programm parses email to plain human-readable text without technical info and special control symbols. As far as i remember it supports MIME parsing also.
    /Maqs.
      Thanks for the tip, I'll give it a try.