perlnewbie9292 has asked for the wisdom of the Perl Monks concerning the following question:

Hello I have the following script with I have been pulling my hair out trying to get the second part working. Please keep in mind I am pretty new to Perl. I am having trouble figuring out how to capture the lines which contain the string "RCPT TO" and add each of the captured lines to the correct section below the To: field. One file will contain sometimes contain one or more messages. In this example it contains three messages then what the script does is separates them into three sections, now I want to append all the RCPT lines to it's corresponding sections. This is code I use on a box which saves input from the web. For some reason every time I try to match using the regex for the RCPT line it captures them but then my buffer is empty and only contains the RCPT data. Thanks for the help in advanced.
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $lineBufferStatus = 0; my @rcptBuffer = (); my @linesBuffer = (); my $msgUpdateCnt = 0; while (<DATA>) { if ( $_ =~ /^DATA(?:\n|\r\n)$/ ) { print "In second If statement DATA if statement\n"; @linesBuffer = (); $lineBufferStatus = 0; } elsif ( $_ =~ /^\.(?:\n|\r\n)$/ ) { if ( int(@linesBuffer) > 0 ) { while( int(@linesBuffer) > 0 ) { print shift(@linesBuffer); } print "\r\n.\r\nQUIT\r\n"; $msgUpdateCnt++; } $lineBufferStatus = 1; } elsif ( $lineBufferStatus == 0 ) { push( @linesBuffer, $_ ); } } if ( int(@linesBuffer) > 0 ) { my $perMsgRcptBufCnt = 0; while( int(@linesBuffer) > 0 ) { print shift(@linesBuffer); } print "\r\n.\r\nQUIT\r\n"; $msgUpdateCnt++; } if ( $msgUpdateCnt == 1 ) { print "---------------------EQUALS 1--------------------\n"; } elsif ( $msgUpdateCnt > 1 ) { print "---------------------EQUALS 2--------------------\n"; } __DATA__ EHLO testdomain.com MAIL FROM:<outsideuser01@outdomain.com> SIZE=1016 BODY=7BIT RCPT TO:<testuser01@testdomain.com> RCPT TO:<testuser02@testdomain.com> DATA Received: from 1.1.1.1(helo=mvastnlufgt.mgrjofpxydauvu.info) by with esmtpa (Exim 4.69) (envelope-from ) id 1MM6HB-1114fm-3T for testuser01@testdomain.com; Sun, 9 Jan 2011 23:07:59 +0100 From: "outsideuser01" <outsideuser01@outdomain.com> To: <testuser01@testdomain.com> Subject: Re: good evening Date: Sun, 9 Jan 2011 23:07:59 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_ppstll_21_19_24" X-Priority: 3 X-Mailer: choakmps.33 Message-ID: <5C74A867.4001970@aclighting.com> ------=_ppstll_21_19_24 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable H i website: http://canadianph +.tz= hzvebp.ajwcd.ru - CanadianPharmacy ------=_ppstll_21_19_24 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=3DContent-Type content=3D"text/html; charset=3Diso-88 +59-= 1"> <STYLE></STYLE> </HEAD> <BODY> H i <br> website: http://canadianph.tzhzvebp.ajwcd.ru - CanadianPharmacy + = </BODY></HTML> ------=_ppstll_21_19_24-- . RSET MAIL FROM:<Megan@bankofdeerfield.com> SIZE=1016 BODY=7BIT RCPT TO:<testuser36@testdomain.com> RCPT TO:<testuser22@testdomain.com> RCPT TO:<testuser99@testdomain.com> DATA Received: from mdmcfntioehaoqtcmjdmjmcm (192.168.1.33) by wonderware.c +om (80.149.49.194) with Microsoft SMTP Server id 8.0.685.24; Sun, 9 J +an 2011 23:08:07 +0100 Message-ID: <4D2A2627.204070@pacunion.com> Date: Sun, 9 Jan 2011 23:08:07 +0100 From: "Megan" <Megan@bankofdeerfield.com> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9 +) Gecko/20100921 Thunderbird/3.1.4 MIME-Version: 1.0 To: <testuser36@testdomain.com> Subject: good morning Content-Type: multipart/alternative; boundary="------------02070800106050608090806" X-Priority: 3 X-Mailer: choakmps.33 Message-ID: <5C74A867.4001970@aclighting.com> ------=_ppstll_21_19_24 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi website: http://canadianp= kerur.ru - CanadianPharmacy ------=_ppstll_21_19_24 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=3DContent-Type content=3D"text/html; charset=3Diso-88 +59-= 1"> <STYLE></STYLE> </HEAD> <BODY> H i <br> website: http://canadianp.kerur.ru - GermanPharmacy = </BODY></HTML> ------=_ppstll_21_19_24-- . RSET MAIL FROM:<robertson@roberts.com> SIZE=9916 BODY=7BIT RCPT TO:<testuser2937@testdomain.com> RCPT TO:<testuser22@testdomain.com> DATA Received: from testdomains (172.12.223.44) by wonderware.com (32.34.49 +.194) with Microsoft 23:08:07 +0100 Message-ID: <4D2A2627.204070@pacunion.com> Date: Sat, 8 Jan 2011 17:18:27 +0100 From: "Megan" <robertson@roberts.com> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9 +) Gecko/20100921 Thunderbird/3.1.4 MIME-Version: 1.0 To: <testuser2937@testdomain.com> Subject: morning Content-Type: multipart/alternative; boundary="------------02070800106050608090806" X-Priority: 3 X-Mailer: choakmps.33 Message-ID: <5C74A867.4001970@aclighting.com> ------=_ppstll_21_19_24 Content-Type: text/plain; . RSET
Using one of the messages as a sample (This is the third message from the __DATA__ section) before and after this is what I am trying to accomplish.
--------THIS IS THE CURRENT OUTPUT-------- Received: from testdomains (172.12.223.44) by wonderware.com (32.34.49 +.194) with Microsoft 23:08:07 +0100 Message-ID: <4D2A2627.204070@pacunion.com> Date: Sat, 8 Jan 2011 17:18:27 +0100 From: "Megan" <robertson@roberts.com> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9 +) Gecko/20100921 Thunderbird/3.1.4 MIME-Version: 1.0 To: <testuser2937testdomain.com> --------THIS IS THE DESIRED OUTPUT ( I have captured both RCPT TO: lin +es and appended them after the Original To:)-------- Received: from testdomains (172.12.223.44) by wonderware.com (32.34.49 +.194) with Microsoft 23:08:07 +0100 Message-ID: <4D2A2627.204070@pacunion.com> Date: Sat, 8 Jan 2011 17:18:27 +0100 From: "Megan" <robertson@roberts.com> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9 +) Gecko/20100921 Thunderbird/3.1.4 MIME-Version: 1.0 To: <testuser2937@testdomain.com> To: <testuser2937@testdomain.com> To: <testuser22@testdomain.com>

Replies are listed 'Best First'.
Re: Help appending data based on location
by jethro (Monsignor) on Jan 25, 2011 at 21:02 UTC

    Parsing email is not a trivial task. Maybe using a CPAN library might be safer. I can't recommend anything (haven't used one yet), but someone else might have an idea what to use. I found http://search.cpan.org/~markov/MailTools-2.07 with a quick search, check out the demos

    What you want is a parser. For simple line oriented tasks a FSM (Finite State Machine) is often a good practical solution that keeps the complexity in check. In Re^3: How to parse a text file there is a simple example of a FSM. What you do with $lineBufferStatus is already a simple FSM. But you would need at least another state that says "I'm at the To-line now"

    By the way to check if an array is empty, int() is rather unusual. Well TMTOWTDI, but the following lines are equivalent and the last of them is commonly used and as safe as your version

    if (int(@array)>0) { if (scalar(@array)>0) { if (@array>0) { if (@array) {
      Hi Thanks for the reply. The only problem with using a cpan mail module is that it's not always in a mail format. What I need to do in only extract certain data from the message files not handle actual mail.
Re: Help appending data based on location
by afresh1 (Hermit) on Jan 25, 2011 at 21:39 UTC

    While I agree that a FSM is what you want and that fully parsing an smtp session is difficult and should be left to the professionals, I think the main thing missing here is keeping track of the RCPT TO addresses and then outputting them where you want them.

    I did this with an @rcpts array and appended it after the headers. To do that, you have to keep track of where you are in the conversation so you know when to output the additional headers. Here is a fixed up version that uses your __DATA__

    #!/usr/bin/perl use strict; use warnings; my @rcpts = (); my @linesBuffer = (); my $msgUpdateCnt = 0; my $state = 'unknown'; while (<DATA>) { s/\r?\n//; # strip newlines, since we don't know if they are ri +ght if ( $state eq 'headers' or $state eq 'body' ) { if ( $_ eq '.' ) { # end of message, reset $state = 'unknown'; } elsif ( $state eq 'headers' and $_ eq '' ) { # end of heade +rs foreach my $rcpt (@rcpts) { push @linesBuffer, 'To: ' . $rcpt; } $state = 'body'; } } elsif ( $_ eq 'DATA' ) { $state = 'headers'; } elsif (/^RCPT \s+ TO: \s* (.*)$/xms) { push @rcpts, $1; } elsif ( $_ eq 'RSET' or $_ eq 'QUIT' ) { # end of message if (@linesBuffer) { print join "\r\n", @linesBuffer, 'QUIT'; print "\r\n"; $msgUpdateCnt++; } # reset @linesBuffer = (); @rcpts = (); $state = 'unknown'; next; } if ( @linesBuffer || $_ ) { push @linesBuffer, $_; } } if (@linesBuffer) { print join "\r\n", @linesBuffer, 'QUIT'; print "\r\n"; $msgUpdateCnt++; } print "---------------------EQUALS $msgUpdateCnt--------------------\n +"; __DATA__
    l8rZ,
    --
    andrew
      Thanks for the help/reply. I've been trying to modify your code b/c is currently does not remove the lines for each section prior to line ^DATA So, at the moment the sample data looks like this
      EHLO testdomain.com MAIL FROM:<outsideuser01@outdomain.com> SIZE=1016 BODY=7BIT RCPT TO:<testuser01@testdomain.com> RCPT TO:<testuser02@testdomain.com> DATA Received: from 1.1.1.1(helo=mvastnlufgt.mgrjofpxydauvu.info) by with esmtpa (Exim 4.69) (envelope-from ) id 1MM6HB-1114fm-3T for testuser01@testdomain.com; Sun, 9 Jan 2011 23:07:59 +0100 From: "outsideuser01" <outsideuser01@outdomain.com> To: <testuser01@testdomain.com> Subject: Re: good evening Date: Sun, 9 Jan 2011 23:07:59 +0100 MIME-Version: 1.0 To: <testuser01@testdomain.com> To: <testuser02@testdomain.com>
      It's correctly appending the RCPT lines but not removing the DATA before the line that starts with ^DATA Should look like this, note how it remove all line prior to ^DATA but only for the corresponding section.
      Received: from 1.1.1.1(helo=mvastnlufgt.mgrjofpxydauvu.info) by with esmtpa (Exim 4.69) (envelope-from ) id 1MM6HB-1114fm-3T for testuser01@testdomain.com; Sun, 9 Jan 2011 23:07:59 +0100 From: "outsideuser01" <outsideuser01@outdomain.com> To: <testuser01@testdomain.com> Subject: Re: good evening Date: Sun, 9 Jan 2011 23:07:59 +0100 MIME-Version: 1.0 To: <testuser01@testdomain.com> To: <testuser02@testdomain.com>
        I've been trying to modify your code b/c

        Show your efforts