I've got a large number of
mbox-format mailboxes here, accumulated over the years of mail backups upon mail backups. I'm trying to go through them all and remove any duplicate messages that may be in them, before finally archiving them to permanent offline storage. I've been putting them in more granular heirarchies over the years, so there may be
~mail/foo, then later may become
~mail/Projects/foo, which contains mostly the same mail, but could be different.
What I need to do is roll through each one, and yank out the duplicate emails that may have been put there from concatenating them or testing/debugging broken procmail recipes, and store them each in a single mbox file that contains only unique mail.
I've looked into Mail::Box::Mbox and Email::Folder::Mbox, but decided to try to roll my own first. Here's what I've got, and it seems to work so far.
use strict;
undef $/;
my %seen;
my @para = split /(\n\n+)/, <>;
while (defined($_ = shift @para)) {
die "No From line!\n" unless /^From /;
my ($id) = map /^Message-ID:\s(\S.*)/im,
split /\n(?! )/;
warn "No Message-ID! [[$_]]\n" unless defined $id;
$_ .= shift @para while @para and $para[0] !~ /^From /;
print unless defined $id and $seen{$id}++;
}
Comments? Improvements?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.