in reply to deleting duplicate emails from a mail file

I'd suggest something a little more robust than what Chady suggested: Scan through all the emails, computing a checksum (I'd use MD5) of the body and certain header elements (Subject, From, etc..). If any messages have the same checksum they are most likely duplicates. If you look at the entire header, you may miss some duplicates (if the duplicate had a different mail queue ID, or came in at a slightly different time). Storing checksums instead of the entire message will prevent you from having to store all of the content of all the messages (which could be a problem if some of them are large).
  • Comment on Re: deleting duplicate emails from a mail file

Replies are listed 'Best First'.
Re: Re: deleting duplicate emails from a mail file
by asiufy (Monk) on Jun 12, 2001 at 19:51 UTC
    Well, sometimes mails are duplicates, but they have tiny differences between them, like an extra blank line, or a .sig. I've seen this happen frequently with mailing list managers.

    So, perhaps it would be better just to go by the headers...