r2ro has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm new in Perl, I mean very limited know-how in Perl as of now. I hope you can help me with my problem formatting my multiple line data below:

Arrival Time: May 2, 2013 10:37:50.813000000

From: <sip:639gwhuaping01-14@119.38.228.43>;tag=70c8b229-1c

To: <sip:639gwhuaping01-14@119.38.228.43>

Arrival Time: May 2, 2013 10:38:05.274000000

From: <sip:639gwhuaping01-01@119.38.228.43>;tag=70c8b229-2

To: <sip:639gwhuaping01-01@119.38.228.43>

Arrival Time: May 2, 2013 10:38:05.451000000

From: <sip:639gwhuaping01-11@119.38.228.43>;tag=70c8b229-16

To: <sip:639gwhuaping01-11@119.38.228.43>

User-Agent: Quintum/1.0.0 SN/0030E130409A SW/P108-09-10

to this

Arrival Time|From|To|User-Agent

May 2, 2013 10:37:50.813000000|<sip:639gwhuaping01-14@119.38.228.43>;tag=70c8b229-1c|<sip:639gwhuaping01-14@119.38.228.43>|--

May 2, 2013 10:38:05.274000000|<sip:639gwhuaping01-01@119.38.228.43>;tag=70c8b229-2|<sip:639gwhuaping01-01@119.38.228.43>|--

May 2, 2013 10:38:05.451000000|<sip:639gwhuaping01-11@119.38.228.43>;tag=70c8b229-16|<sip:639gwhuaping01-11@119.38.228.43>|Quintum/1.0.0 SN/0030E130409A SW/P108-09-10

my code:
use strict; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open my $in, "<", $infile or die $!; open my $out, ">", $outfile or die $!; print $out "Arrival Time|From|To|User-Agent\n"; my $line; while ( <$in> ) { s/ //; s/ //; s/ Malay Peninsula Standard Time//; s/From:/\|From:/; s/To:/\|To:/; s/User-Agent:/\|User-Agent:/; $line .= $_ if ($_ =~ m/Arrival Time:|\|From:|\|To:|\|User-Agent:/); chomp $line if /^Arrival Time:|\|From:/; if ($line =~ m/User-Agent:/) { $line =~ s/^\n//; } $line =~ s/Arrival Time: //; $line =~ s/From: //; $line =~ s/To: //; $line =~ s/User-Agent: //; if (eof){chomp $line} } print $out $line; close $in; close $out;

It's a sequence of lines beginning with "Arrival Time", "From:", "To:" and sometimes but not always "User-Agent". In such cases, I need to substitute "--"

I'm using 5.18 for windows.

Your response is very much appreciated.

Thank you in advance

r2ro

Replies are listed 'Best First'.
Re: Arranging multiple lines
by hdb (Monsignor) on Jun 03, 2013 at 10:21 UTC

    Two approaches seem logical to me. First, similar to yours, based on regular expressions. I would slurp in the data in one go though and then replace superfluous pieces, like this:

    use strict; use warnings; my $data; { local $/; $data=<DATA>; } $data =~ s/\n\s*\n(From: |To: |User-Agent: )/|/g; $data =~ s/Arrival Time: ([^|]*\|[^|]*\|[^|\n]*)\n/$1|--\n/g; print "Arrival Time|From|To|User-Agent\n\n"; print $data; __DATA__ Arrival Time: May 2, 2013 10:37:50.813000000 From: <sip:639gwhuaping01-14@119.38.228.43>;tag=70c8b229-1c To: <sip:639gwhuaping01-14@119.38.228.43> Arrival Time: May 2, 2013 10:38:05.274000000 From: <sip:639gwhuaping01-01@119.38.228.43>;tag=70c8b229-2 To: <sip:639gwhuaping01-01@119.38.228.43> Arrival Time: May 2, 2013 10:38:05.451000000 From: <sip:639gwhuaping01-11@119.38.228.43>;tag=70c8b229-16 To: <sip:639gwhuaping01-11@119.38.228.43> User-Agent: Quintum/1.0.0 SN/0030E130409A SW/P108-09-10

    Alternatively, and probably easier to maintain, is to read the data into a hash and print/empty whenever a dataset is complete, like this (with the same __DATA__ segment as above):

    use strict; use warnings; sub printdata { my $data = shift; $$data{"User-Agent"} //= "--"; print join "|", @$data{ ( "Arrival Time", "From", "To", "User- +Agent" ) }; print "\n\n"; %$data = (); } my %data; my ($item, $value); print "Arrival Time|From|To|User-Agent\n\n"; while(<DATA>){ chomp; next unless ($item, $value) = /^(.*?): (.*)/; printdata \%data if $item eq "Arrival Time" and %data; $data{$item} = $value; } printdata \%data; __DATA__ Arrival Time: May 2, 2013 10:37:50.813000000 ...

      Thank you very much hdb,

      but again you have to forgive my ignorance on this, I got lost in incorporating your code with input and output file arguments. :(

      Please bear with me, I just started studying perl 2 days ago.

      use strict; use warnings; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open my $in, "<", $infile or die $!; open my $out, ">", $outfile or die $!; sub printdata { my $data = shift; $$data{"User-Agent"} //= "--"; print join "|", @$data{ ( "Arrival Time", "From", "To", "User- +Agent" ) }; print "\n\n"; %$data = (); } my %data; my ($item, $value); print "Arrival Time|From|To|User-Agent\n\n"; while(<$in>){ chomp; next unless ($item, $value) = /^(.*?): (.*)/; printdata \%data if $item eq "Arrival Time" and %data; $data{$item} = $value; } printdata \%data; close $in; close $out;

        One way to do it is to pass the $out handle into the printdata function. Beware of commata after $out in this example:

        use strict; use warnings; sub printdata { my $out = shift; my $data = shift; $$data{"User-Agent"} //= "--"; print $out join "|", @$data{ ( "Arrival Time", "From", "To", "User +-Agent" ) }; print $out "\n\n"; %$data = (); } open my $out, ">", "tmp.txt"; my %data; my ($item, $value); print $out "Arrival Time|From|To|User-Agent\n\n"; while(<DATA>){ chomp; next unless ($item, $value) = /^(.*?): (.*)/; printdata $out, \%data if $item eq "Arrival Time" and %data; $data{$item} = $value; } printdata $out, \%data;

        Alternatively, you could assemble a string in printdata, return it from the sub and print later.