jonnyfolk has asked for the wisdom of the Perl Monks concerning the following question:

I have an email inbox for which I have written a script to fetch and read on demand and pick out the particular message of interest.

I had thought that splitting the message was going to be the most straightforward part as I expected some division between messages such as a === or other such specific ending on which a split can be done. Alas there was no such animal! The only constant that there seems to be is the word From at the beginning of the message, but of course if I split on that there is too much chance that it also happens to be in the message and the whole thing would go down the chute...

Since this is being done all the time by email software there must be a standard way of doing it and if anyone could help me I would as always be very grateful.

Replies are listed 'Best First'.
Re: Splitting email inbox text into separate messages
by Abigail-II (Bishop) on Mar 12, 2004 at 15:46 UTC
    Since this is being done all the time by email software there must be a standard way of doing it and if anyone could help me I would as always be very grateful.
    Well, there is more than one standard, and no doubt there's also email software that's using it's own format. However, "^From " (there's a space after From), is perhaps the most common way of separating messages in email folders. Any leading From in a message is escaped with a > sign.

    Having said that, there are a billion modules in the Mail:: hierarchy that will enable you to parse mail folders. It's rumoured that they work. But their shear number has kept me away from them - I just parse folders myself, looking for leading "From " tokens. Has always worked for me so far!

    Abigail

      Any leading From in a message is escaped with a > sign...

      ...is the bit of information I was lacking. I shall carry on and split with confidence. Thanks very much, Abigail-II.

Re: Splitting email inbox text into separate messages
by Happy-the-monk (Canon) on Mar 12, 2004 at 16:07 UTC
    I sugest a brave looking at Mail::Box if you don't know it yet.

    Sören

Re: Splitting email inbox text into separate messages
by mattriff (Chaplain) on Mar 12, 2004 at 15:49 UTC
    If we're talking about UNIX mbox format mailboxes, the first line of the message will always be like:

    From foo@example.com Thu Mar 11 22:33:09 2004

    From, a space, an e-mail address, and a date stamp. You should be pretty safe in looking for that format in splitting on it.

    It's also probable that the mail delivery agent on the system you are getting the mail from is adding a > to the begins of any body line that begins with From.

    - Matt Riffle
      VP Technology, pair Networks, Inc.
      (although, I speak only for myself; code is untested unless otherwise stated)
Re: Splitting email inbox text into separate messages
by Vautrin (Hermit) on Mar 12, 2004 at 15:59 UTC

    You do not mention what format you are trying to deal with. Are you trying to read directly from a Unix mail spool? Are you trying to read a file in unix mbox format? Are you an Emacs user trying to read the Babyl formatted files produced by Rmail? Or, perhaps, is it one message per file with an index, and you need to parse the index and get rid of the headers? Or, is this for Outlook (Express) on a Windows system, using a completely different beast?

    I can't tell you how to parse all of those formats off the top of my head. I can tell you that Abigail's suggestion of splitting on From: headers will work if you're using mbox format (and I think Babyl), however it will fail if someone includes an e-mail to you with the string "From:" at the beginning of a line. (Which probably isn't a big concern).

    However, the key to all of this is the format. If you know what format the program you are trying to read from is using, you can look up the specifications online. There's probably also a module on CPAN available to parse it so you don't have to code. (And that module will possibly even handle things like indexes.)


    Want to support the EFF and FSF by buying cool stuff? Click here.
Re: Splitting email inbox text into separate messages
by jdtoronto (Prior) on Mar 12, 2004 at 15:50 UTC
    You make no mention of what OS or What email package you are using. Their are some standards, but many are proprietary. The email client that is building the mailboxes is a minimum before we can be of much help at all!

    jdtoronto

Re: Splitting email inbox text into separate messages
by ambrus (Abbot) on Mar 12, 2004 at 16:51 UTC

    Just search for a paragraph that starts with From.