gw1500se has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script where I need to parse email source and in particular I need to process 'received' headers. My script successfully extracts all the 'received' headers using Mail::Message. However, the only module I can find that supposedly parses those headers is Mail::Field::Received. Mail::Field seems to want to read the email itself but Mail::Message has already done that. Not being a perl guru I can only guess that in spite of the documentation that says Mail::Field::Received should not be used directly, doing so is the way to accomplish what I want. Basically, how do I pass the received header from Mail::Message to Mail::Field::Received without Mail::Field re-reading the message source? That is, is the structure of the headers from Mail::Message compatible with what Mail::Field::Received wants to parse? TIA.
#!/usr/bin/perl -w use strict; use Mail::Message; use Mail::Field; use Data::Dumper; my $msg=Mail::Message->read(\*STDIN); my $head=$msg->head(); my $from=$msg->from; my @to=$msg->to; my @subject=$msg->subject; my @recv=$head->get('received'); foreach my $recv_item (@recv) { # I need to create an array of hashes, each of which # contains a parsed 'received' header }
I think there is also a monkey wrench in here as the headers can wrap, according to the documentation. I think that means a single received header can wind up in 2 array items. That will be another can of worms but I will be happy for now just being able to parse the non-wrapped headers.

Replies are listed 'Best First'.
Re: Mail::Message and Mail::Field
by Anonymous Monk on Nov 13, 2008 at 15:46 UTC
    #!/usr/bin/perl -- use strict; use warnings; use Mail::Message; use Mail::Field; use Data::Dumper; local $Data::Dumper::Indent=1; my $file = <<'__MAIL__'; Received: from hawk.prod.itd.earthlink.net (hawk.prod.itd.earthlink.ne +t [207.217.120.22]) by no2.superb.net (8.11.1/8.11.1) with ESMTP id f22K5xv12202 for <gboyd@expita.com>; Fri, 2 Mar 2001 15:05:59 -0500 (EST) Received: from pacer2 (hsa184.pool015.at101.earthlink.net [216.249.86. +184]) by hawk.prod.itd.earthlink.net (EL-8_9_3_3/8.9.3) with SMTP id + MAA14914 for <gboyd@expita.com>; Fri, 2 Mar 2001 12:05:58 -0800 (PST) Message-ID: <001301c0a353$feb287e0$a64cfea9@pacer2> Reply-To: "Somebody" <SomeReplyAddr@somplace.com> From: "Somebody" <somebozo@yahoo.com> To: <gboyd@expita.com> Subject: Test message Date: Fri, 2 Mar 2001 12:04:31 -0800 Organization: SomeOrganization X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6600 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600 X-Sorted: Default Status: RO Test message!! Gotted from the http://www.expita.com/header1.html __MAIL__ my $msg=Mail::Message->read($file); for my $rec ( $msg->head->get('Received') ){ # print "Received: $rec\n\n"; my $received = Mail::Field->new('Received', $rec ); printf "\n parsed_ok %s \n\n", $received->parsed_ok(); if( $received->parsed_ok() ){ print Dumper( $received->parse_tree ), "\n"; } else { print $received->diagnostics(), "\n"; } } __END__ parsed_ok 1 $VAR1 = { 'for' => { 'whole' => 'for <gboyd@expita.com>', 'for' => '<gboyd@expita.com>' }, 'date_time' => { 'hour' => '15', 'rest' => '2001 15:05:59 -0500 (EST)', 'zone' => '-0500 (EST)', 'second' => '59', 'month' => 'Mar', 'day_of_year' => '2 Mar', 'date_time' => 'Fri, 2 Mar 2001 15:05:59 -0500 (EST)', 'month_day' => ' 2', 'hms' => '15:05:59', 'whole' => 'Fri, 2 Mar 2001 15:05:59 -0500 (EST)', 'minute' => '05', 'week_day' => 'Fri', 'year' => '2001' }, 'comments' => [ '(hawk.prod.itd.earthlink.net [207.217.120.22])', '(8.11.1/8.11.1)' ], 'by' => { 'whole' => 'by no2.superb.net', 'domain' => 'no2.superb.net', 'comments' => [ '(8.11.1/8.11.1)' ] }, 'whole' => bless( [ 'Received', ' from hawk.prod.itd.earthlink.net (hawk.prod.itd.earthlink.net [2 +07.217.120.22]) by no2.superb.net (8.11.1/8.11.1) with ESMTP id f22K5xv12202 for <gboyd@expita.com>; Fri, 2 Mar 2001 15:05:59 -0500 (EST) ' ], 'Mail::Message::Field::Fast' ), 'from' => { 'whole' => 'from hawk.prod.itd.earthlink.net (hawk.prod.itd.earthl +ink.net [207.217.120.22]) ', 'domain' => 'hawk.prod.itd.earthlink.net', 'from' => 'hawk.prod.itd.earthlink.net', 'HELO' => 'hawk.prod.itd.earthlink.net', 'address' => '207.217.120.22', 'comments' => [ '(hawk.prod.itd.earthlink.net [207.217.120.22])' ] }, 'id' => { 'whole' => 'id f22K5xv12202', 'id' => 'f22K5xv12202' }, 'with' => { 'whole' => 'with ESMTP', 'with' => 'ESMTP' } }; parsed_ok 1 $VAR1 = { 'for' => { 'whole' => 'for <gboyd@expita.com>', 'for' => '<gboyd@expita.com>' }, 'date_time' => { 'hour' => '12', 'rest' => '2001 12:05:58 -0800 (PST)', 'zone' => '-0800 (PST)', 'second' => '58', 'month' => 'Mar', 'day_of_year' => '2 Mar', 'date_time' => 'Fri, 2 Mar 2001 12:05:58 -0800 (PST)', 'month_day' => ' 2', 'hms' => '12:05:58', 'whole' => 'Fri, 2 Mar 2001 12:05:58 -0800 (PST)', 'minute' => '05', 'week_day' => 'Fri', 'year' => '2001' }, 'comments' => [ '(hsa184.pool015.at101.earthlink.net [216.249.86.184])', '(EL-8_9_3_3/8.9.3)' ], 'by' => { 'whole' => 'by hawk.prod.itd.earthlink.net', 'domain' => 'hawk.prod.itd.earthlink.net', 'comments' => [ '(EL-8_9_3_3/8.9.3)' ] }, 'whole' => bless( [ 'Received', ' from pacer2 (hsa184.pool015.at101.earthlink.net [216.249.86.184] +) by hawk.prod.itd.earthlink.net (EL-8_9_3_3/8.9.3) with SMTP id + MAA14914 for <gboyd@expita.com>; Fri, 2 Mar 2001 12:05:58 -0800 (PST) ' ], 'Mail::Message::Field::Fast' ), 'from' => { 'whole' => 'from pacer2 (hsa184.pool015.at101.earthlink.net [216.2 +49.86.184]) ', 'domain' => 'hsa184.pool015.at101.earthlink.net', 'from' => 'pacer2', 'HELO' => 'pacer2', 'address' => '216.249.86.184', 'comments' => [ '(hsa184.pool015.at101.earthlink.net [216.249.86.184])' ] }, 'id' => { 'whole' => 'id MAA14914', 'id' => 'MAA14914' }, 'with' => { 'whole' => 'with SMTP', 'with' => 'SMTP' } };
Re: Mail::Message and Mail::Field
by Anonymous Monk on Nov 13, 2008 at 14:35 UTC
      What is missing? I can't post data without sanitizing it and by then it will be useless as data.
        So where are we supposed to get sample data?