in reply to Parsing out "unique" messages from mbox files

Message-IDs may not follow the format you are using, some clients and spambots send broken ones. My approach would be to eat one message at a time, md5 the whole thing, put it in a hash of md5=>message. (You can either check for previous hits, or overwrite the old ones. no real diff;) Then, after eating the source, you can dump everything out with a print $uniques{$_} foreach (keys %uniques); Once again, that is just how I would approach it, and it looks like your code will work if the MIDs are correct anyway.

HTH, and ++ on the use strict; too! (Don't see it much in SoPW:)

mhoward - at - hattmoward.org

Replies are listed 'Best First'.
Re: Re: Parsing out "unique" messages from mbox files
by waswas-fng (Curate) on Jun 13, 2003 at 14:09 UTC
    Aye not only can message ID be broken, you may find that they are not unique. the md5 aproch listed above will catch any unique message.

    -Waswas