in reply to Parsing email for headers
CountZero and zwon have provided guidance on parsing the headers. This expands on zwon's referral to Date::Parse.
#!/usr/bin/perl use warnings; use strict; use Date::Parse; # 799107 my $data = ''; while ( <DATA> ) { # print $_; $data = $_; chomp ($data); if ($data =~ /^DATE:\s+(\w{3}, \d+ \w{3} \d{4} \d\d:\d\d:\d\d) + ([+-]\d{4})/ ) { my $date = $1; # Don't do this; check existence of $1 my $zone = $2; # and $2 before you try to use them! print "'Date' found: $date which converts to: "; my $time = str2time($date); print "$time Zone Offset: $zone\n"; print "\t Restringified: " . localtime($time) . "\n"; # reconv +ert, solely as a check on above } } =head OUTPUT 'Date' found: Sat, 9 Feb 2008 17:14:18 which converts to: 1202595258 Z +one Offset: -0730 Restringified: Sat Feb 9 17:14:18 2008 'Date' found: Sun, 10 Feb 2008 04:23:55 which converts to: 1202635435 +Zone Offset: +0400 Restringified: Sun Feb 10 04:23:55 2008 =cut __DATA__ SUBJECT: test FROM: John Smith DATE: Sat, 9 Feb 2008 17:14:18 -0730 TO: Joe Doe additional for demo only DATE: Sun, 10 Feb 2008 04:23:55 +0400
You could also roll your own, if for some unreasonable reason you want (not recommended) to avoid using an email module. Read on:
/me didn't bother to create a set of .eml files to read. Reading from files rather than from __DATA__ is left as an exercise to the OP.
#!/usr/bin/perl use warnings; use strict; use Date::Parse; # 799107 my $data = ''; while ( <DATA> ) { # print $_; $data = $_; chomp ($data); if ($data =~ /(^SUBJECT: .*)/) { print; } if ($data =~ /(^FROM: .*)/) { print; } if ($data =~ /(\w{3}, \d+ \w{3} \d{4} \d\d:\d\d:\d\d) ([+-]\d{4})/ + ) { my $date = $1; # Don't do this; check existence of $1 my $zone = $2; # and $2 before you try to use them! print "'Date' found: $date which converts to: "; my $time = str2time($date); print "$time Zone Offset: $zone\n"; print "\t Restringified: " . localtime($time) . "\n"; # reconv +ert, solely as a check on above } if ($data =~ /Message-ID/ix) { print "'ID' found: $data \n"; } if ($data =~ /(^TO: .*)/) { print; # last; # uncomment to make "the script ... quit reading t +he mail" after the "TO:" field } if ($data =~ /-{4}=_Part_.*|-{4}_{0,1}={0,1}_{0,1}NextPart_{0,1}.* +/x ) { print "'Part' header found: $data\n"; } } =head OUTPUT SUBJECT: test FROM: John Smith 'Date' found: Sat, 9 Feb 2008 17:14:18 which converts to: 1202595258 Z +one Offset: -0730 Restringified: Sat Feb 9 17:14:18 2008 TO: Joe Doe 'ID' found: Message-ID <F6E1D1E016C6A7468EEA1708CA24F72B1E363E@SERVER +.fake.local> 'ID' found: Message-Id <6A7468EEA17F6E1D1E016C08CA24F72B1E363E@SERVER +.fake.com> 'Part' header found: ----=_Part_abcd 'Part' header found: ----_=_NextPart_1234 'Part' header found: ----NextPartXYZ =cut __DATA__ SUBJECT: test FROM: John Smith DATE: Sat, 9 Feb 2008 17:14:18 -0730 TO: Joe Doe Message-ID <F6E1D1E016C6A7468EEA1708CA24F72B1E363E@SERVER.fake.local> Message-Id <6A7468EEA17F6E1D1E016C08CA24F72B1E363E@SERVER.fake.com> ----=_Part_abcd ----_=_NextPart_1234 ----NextPartXYZ Text of message here.
Your post seems to be a bit conflicted: You say you need only the first four fields but then discuss the variance in the "...Part..." as if it is an issue. Since you haven't provided any guidance on how you may want to use them, the above merely demonstrates use of a case insensitive regex.
And, just BTW, you probably meant "disparate" rather than "desparate."
:-)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parsing email for headers
by PoorLuzer (Beadle) on Oct 05, 2009 at 05:01 UTC |