hacker has asked for the wisdom of the Perl Monks concerning the following question:
What I need to do is roll through each one, and yank out the duplicate emails that may have been put there from concatenating them or testing/debugging broken procmail recipes, and store them each in a single mbox file that contains only unique mail.
I've looked into Mail::Box::Mbox and Email::Folder::Mbox, but decided to try to roll my own first. Here's what I've got, and it seems to work so far.
use strict; undef $/; my %seen; my @para = split /(\n\n+)/, <>; while (defined($_ = shift @para)) { die "No From line!\n" unless /^From /; my ($id) = map /^Message-ID:\s(\S.*)/im, split /\n(?! )/; warn "No Message-ID! [[$_]]\n" unless defined $id; $_ .= shift @para while @para and $para[0] !~ /^From /; print unless defined $id and $seen{$id}++; }
Comments? Improvements?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing out "unique" messages from mbox files
by meredith (Friar) on Jun 13, 2003 at 13:34 UTC | |
by waswas-fng (Curate) on Jun 13, 2003 at 14:09 UTC | |
|
Re: Parsing out "unique" messages from mbox files
by valdez (Monsignor) on Jun 13, 2003 at 14:04 UTC | |
|
Re: Parsing out "unique" messages from mbox files
by blahblah (Friar) on Jun 13, 2003 at 16:33 UTC | |
|
Re: Parsing out "unique" messages from mbox files
by Aristotle (Chancellor) on Jun 15, 2003 at 13:53 UTC |