PoorLuzer has asked for the wisdom of the Perl Monks concerning the following question:
I am using a Python based tool called GMailBackup to archive my GMail locally.
This tool downloads all mails as a whole (headers and all) and stores them as .eml files
In order to overcome some limitations in the source code, I need to parse theses .eml files in order to grab these 4 fields:
Seemed like a simple slurp and parse operation to me until I ran the program on .emls from desparate sources.
It's a nightmare. Issues arise from trivial changes, for example, Outlook desktop client seems to send mail with the "Message-ID" as "Message-ID", but the webclient sends the field as "Message-Id"; to something different like how separate mail servers mark the boundaries between headers and body.
For example, GMail and other email servers separate the header using "----=_Part_".
However, some M$ Servers seem to use "----_=_NextPart_", and others "----NextPart" and so on.
I have three questions :
1. Is there some module/subroutine/script that I can use to parse these 4 fields reliably from raw mails? The mails can be long (even some hundreds of MBs) and so the script should quit reading the mail as soon as these values are found from the header.
2. Is there any possibility where a "Message-ID" is not part of the mail header? I have not come across any such email over the 4GB of mail I have downloaded so far, but any misbehaving servers we should be aware of?
This ID is used to keep track of which mail has already been downloaded etc - a sort of a unique identifier for every email.
3. I would like to parse the "Date" field that seems to be universally in the stftime format, like "Sat, 9 Feb 2008 17:14:18 -0730"
I tried to use
if(($year,$month,$day) = Date::Calc::Parse_Date("Sat, 9 Feb 2008 17:1 +4:18 -0730")) { printf "\n[*] %d %d %d", $year,$month,$day; }
but it fails.
I need to convert something like "Sat, 9 Feb 2008 17:14:18 -0730" to "20080209171418"
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing email for headers
by CountZero (Bishop) on Oct 04, 2009 at 16:45 UTC | |
|
Re: Parsing email for headers
by zwon (Abbot) on Oct 04, 2009 at 16:54 UTC | |
|
Re: Parsing email for headers
by ww (Archbishop) on Oct 05, 2009 at 00:49 UTC | |
by PoorLuzer (Beadle) on Oct 05, 2009 at 05:01 UTC | |
|
Re: Parsing email for headers
by McD (Chaplain) on Oct 05, 2009 at 02:06 UTC | |
|
Re: Parsing email for headers
by dsheroh (Monsignor) on Oct 05, 2009 at 10:34 UTC |