I have about 6500 email messages in a Pine mailbox that I need to retrieve data from. From my readings so far, I'm working under the assumption that Pine uses the "mbox" format for mail (using pine v4.53 on a linux box). It stores all email from one mailbox in a flat text file. Please let me know if I'm wrong about the mbox format.

    Constraints and goals:
  1. Read all the messages and get what I want (see below for specifics)
  2. I will be looking at every single message in the mailbox
  3. I can safely ignore header information, except for Subject
  4. There will be no attachments
  5. If all goes well, I will be using this script once to collect the data, and that's it.
  6. I'm an Object-oriented noob (never had to use it yet)
I've been reading, and the sheer number of mbox and mail parsers available has me a little turned around. So I'm looking for advice before I begin writing the script.

I've looked at: this node. He asks the same question, but didn't give file specifics. I hope I pinpointed it above.

this node. Where the guy uses Mail::MboxParser to parse.

There was this: Mail::MboxParser::Mail::Body

Mail::Box From the page, this looks like I could just do a foreach on every email. Which I think would work, but how slow is it going to be?

I also saw mention of just looking for paragraphs that start with
"^From <email address> <date>"
Should I do this, because it seems like it could be the most basic way of doing it?

the mail will look like this:

...normal header info... Subject: Backup SUCCESS ..rest of header... Backup ID: 123 Path: <path> Backup Type: one of 3 different types Size: <in Kbytes> ...some other junk I don't need... Start time: Wed May 5 09:36:40 2004 End time: Wed May 5 09:43:19 2004 Errors: 0 some other stuff Elapsed time: 0 hr 6 mn 39 s

I'm grabbing the ID,the path, the type of backup, the size, and i'm going to throw it into a mysql table. The ID is the primary key, but the size and elapsed time aren't stored anywhere after the email, so we want to grab it for comparisions and whatnot.

Given what I'm faced with, which way should I go?

Thanks.


In reply to Parsing a Pine mailbox by tdp05

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.