Update : Took advice of jdporter. Kept settings in hash for convenience.

Update : Added die() flowcontrol at close() by advice of DigitalKitty

Previously one huge mailbox file (think about 3GB) was used
to store the archived e-mail. To make searching in
this easier and to make future manipulation easier I wrote a little script that writes e-mail data (dumped by
the Exim MTA into files) into a MySQL db.

#!/usr/bin/perl use strict; use warnings; use DBI; use File::Copy; my $dbh = undef; my $sth = undef; my %const = ( dbhost => 'localhost', dbname => 'mail', dblogin => 'mail', dbpassword => 'p4v1li0n', dbhandler => \$dbh, statementhandler => \$sth, statement => '', maildir => '/usr/db_mail/', currentfile => '', mail_datetime => '', mail_headers => '', mail_from => '', mail_to => '', mail_cc => '', mail_subject => '', mail_body => '', ); sub dbconnect { $const{dbhandler} = DBI->connect("DBI:mysql:$const{dbname}:$const{db +host}",$const{dblogin},$const{dbpassword}); } sub dbdisconnect { if($const{dbhandler}) { $const{statementhandler}->finish() if $sth; $const{dbhandler}->disconnect(); } } sub insert { $const{mail_body} = substr $const{mail_body}, 0, 1000000; #eerste MB $const{statement} = qq[INSERT INTO archive VALUES(?,?,?,?,?,?,?,?)]; dbconnect(); $const{statementhandler} = $const{dbhandler}->prepare($const{stateme +nt}); $const{statementhandler}->execute(undef,$const{mail_datetime},$const +{mail_from},$const{mail_to},$const{mail_cc},$const{mail_subject},$con +st{mail_headers},$const{mail_body}); dbdisconnect(); } sub parse { open(FILE, $_[0]) or return; # print "$_[0]\n"; my $mail = join('', <FILE>) if (-f $_[0]) && ($_[0] =~ /^$const{mail +dir}/); close(FILE); return if !$mail; local $/=undef; ($const{mail_datetime}) = $mail =~ m/Delivery-date: (.*?)\n/s; ($const{mail_from}) = $mail =~ m/From: (.*?)\n/s; ($const{mail_to}) = $mail =~ m/To: (.*?)\n/s; ($const{mail_cc}) = $mail =~ m/Cc: (.*?)\n/s; ($const{mail_subject}) = $mail =~ m/Subject: (.*?)\n/s; ($const{mail_headers}) = $mail =~ m/^(.*?)\n\n/s; ($const{mail_body}) = $mail =~ m/\n\n(.*?)$/s; insert(); unlink $_[0] if move($_[0],"$const{maildir}parsed/$const{currentfile +}") == 1; } opendir(MAILDIR, $const{maildir}) or exit 10; #Can't open maildir foreach my $thisfile (readdir(MAILDIR)) { $const{currentfile} = $thisfile; parse("$const{maildir}$thisfile"); } closedir(MAILDIR); exit 0;

Replies are listed 'Best First'.
Re: Archive mail into a database
by jdporter (Paladin) on Jan 11, 2006 at 16:01 UTC

    Putting all your variables into a hash completely obliviates all benefits of using strict 'vars'.
    You may as well make them all global variables and turn off strict 'vars'. Your code would be shorter and easier to read.

    Now, if you really are tied :-) to the hash-based variable approach, you could at least use something like Tie::StrictHash, or the possibly somewhat more namespace-scaleable Tie::SecureHash.

    We're building the house of the future together.
Re: Archive mail into a database
by jdporter (Paladin) on Jan 11, 2006 at 16:26 UTC

    I don't believe that you really need to care about that close() failing, since the file is open for reading. (Of course it doesn't hurt to check.)
    But I'd be much more concerned about the possible failure of those move and unlink calls.

    Btw... Why are you using external 'mv'? You could use the move function of the standard File::Copy module; it is both portable and 'smart'.

    We're building the house of the future together.
Re: Archive mail into a database
by jdporter (Paladin) on Jan 11, 2006 at 17:03 UTC

    It seems to me that you could get "unexpected" results by doing

    $const{mail_to} = $1 if $mail =~ m/To: (.*?)\n/s;
    That only sets the variable if the match is made; if no match is made, the variable retains its current value, which — because of the way you're using global variables — could be what was found in the previous file. I think I'd do it this way:
    ($const{mail_to}) = $mail =~ m/To: (.*?)\n/s;
    That way, the variable gets — properly — undef if no match is made. Personally, I'd be inclined to pass the values as arguments to insert, rather than using global variables.

    Update: need parens around the LHS when assigning to scalar from regex match

    We're building the house of the future together.
Re: Archive mail into a database
by zby (Vicar) on Jan 11, 2006 at 16:40 UTC
    How about using some existing modules to parse the emails instead of using regexes. With a quick search on cpan I found Email::Simple and Email::Abstract, I don't know if they are of any value.