hiradhu has asked for the wisdom of the Perl Monks concerning the following question:

Hi PERL Monks, This is my second attempt..I have got good suggestions from a PERL MONK but I dont know to use it. Can u pls help me again?! Here is what I need is. I have multiple ( at Runtime I need to find out the number) files "su1_file.txt" etc.. with the following format.
10/22/2008 00:22:01 Start testing --- File: Open Mode: Single Use +r, RPVersion: 2 10/22/2008 00:22:01 Start ------- 1:1:11002_CustomerAdd_WithNotes_R +q.xml 10/22/2008 00:22:04 New QBXMLRP 0.328 65.03 0.00 0.00 10/22/2008 00:22:06 OpenConnection 1.031 65.03 0.00 0.0 +0 10/22/2008 00:22:08 BeginSession 0.453 65.03 0.00 0.00 10/22/2008 00:22:14 ProcessRequest 4.984 65.46 0.43 0.8 +3 10/22/2008 00:22:15 EndSession 0.016 65.46 0.00 0.00 10/22/2008 00:22:15 End --------- 1:1:11002_CustomerAdd_WithNotes_R +q.xml 10/22/2008 00:22:15 Start ------- 1:1:11004_AccountAdd_1_Rq.xml 10/22/2008 00:22:16 BeginSession 0.031 65.46 0.00 0.00 10/22/2008 00:22:17 ProcessRequest 0.422 65.57 0.10 0.5 +9 10/22/2008 00:22:18 EndSession 0.000 65.57 0.00 0.00 10/22/2008 00:22:18 End --------- 1:1:11004_AccountAdd_1_Rq.xml 10/22/2008 00:22:18 Start ------- 1:1:11007_EmployeeAdd_Rq.xml 10/22/2008 00:22:20 BeginSession 0.094 65.57 0.00 0.00 10/22/2008 00:22:21 ProcessRequest 0.766 65.55 -0.02 0. +88 10/22/2008 00:22:22 EndSession 0.016 65.55 0.00 0.00 10/22/2008 00:22:22 End --------- 1:1:11007_EmployeeAdd_Rq.xml
I have to parse thro the files and get only xml names and the first number next to Process Request for each xml. My code for that goes like this
while (<INPUT>) { chomp($_); # Look for title line if ($_ =~/Start -/) { my @sdk_line = split(/\t/, $_); push(@temp, substr($sdk_line[1], 18)); } }
while (<INPUT>) { chomp($_); # find process time for that title if ($_ =~/Process/) { print split(/\t/, $_), "\n"; my @sdk_line = split(/\t/, $_); push(@temp , $sdk_line[2]); } }
I have to write this to a OUTPUT file in this format. SU1, SU2 are values from files su1_file.txt, su2_file.txt. Which are tab seperated files. The output should also be tab seperated so that I can open in xls.
Request Name SU1 SU2 SU3 SU-Avg MU1 MU2 MU3 + MU-Avg 11002_CustomerAdd_WithNotes_Rq.xml 4.984 6.766 6.766 2.645 + 2.141 6.125 6.766 3.006 11004_AccountAdd_1_Rq.xml 0.422 1.203 1.203 0.404 0.297 + 1.062 1.203 0.512 11007_EmployeeAdd_Rq.xml 0.766 0.359 0.359 0.212 0.250 + 0.281 0.359 0.178
I have hardcoded my script to parse only 3 files. How do I do it for multiple/unknown number of files... This is my code.
## Get the XML names ## sub GetXMLNames { open(INPUT, $inputPath); my @temp_xml = (); while (<INPUT>) { chomp($_); # Look for title line if ($_ =~/Start -/) { my @sdk_line = split(/\t/, $_); push(@temp, substr($sdk_line[1], 18)); } } close INPUT; return @temp; } ## Get the Process time ## sub GetOnlyProcessTime{ open(LOG,">>$parseSDKPerfResultsLog"); open(INPUT, $inputPath); my @temp =(); while (<INPUT>) { chomp($_); # find process time for that title if ($_ =~/Process/) { my @sdk_line = split(/\t/, $_); push(@temp , $sdk_line[2]); } } my $number = scalar(@temp); if($number < $Total_number_of_requests){ my $missing = $Total_number_of_requests - $number; print LOG "$missing request(s) in $inputPath have not got exec +uted. Hence exiting.\n"; close LOG; close INPUT; exit; } close INPUT; return @temp; } ## Parse all the logs ## sub parseSDKPerfmonLogs{ open(LOG,">>$parseSDKPerfResultsLog"); for( my $su=1; $su <= $su_files; $su++){ $su_inputfile = "su".$su."_sdkperfmonlog.txt"; push(@su_inputfile,$su_inputfile); } for( my $mu=1; $mu <= $mu_files; $mu++){ $mu_inputfile = "mu".$mu."_sdkperfmonlog.txt"; push(@mu_inputfile,$mu_inputfile); } #############Get the Request file names in the first column ###### +####### $inputPath=$ResFolderPath."\\su".$su_files."_sdkperfmonlog.txt"; @xml_names = GetXMLNames($inputPath); $Total_number_of_requests = @xml_names; ############# end of Get the request names in results file ####### +################### #############Get the SU process time ############# foreach $sufile (@su_inputfile){ #print $sufile; $inputPath=$ResFolderPath."\\".$sufile; $filenumber = substr($sufile,2,1); @temp = GetOnlyProcessTime($inputPath); if($filenumber <= $su_files){ if( $filenumber == 1){@su_first = @temp; } elsif($filenumber == 2){@su_second = @temp; } elsif($filenumber == 3){@su_third = @temp; } } } #############End of Get the SU process time ############# #############Get the MU process time ############# foreach $mufile (@mu_inputfile){ #print $sufile; $inputPath=$ResFolderPath."\\".$mufile; $filenumber = substr($mufile,2,1); @temp = GetOnlyProcessTime($inputPath); if($filenumber <= $mu_files){ if( $filenumber == 1){@mu_first = @temp; } elsif($filenumber == 2){@mu_second = @temp; } elsif($filenumber == 3){@mu_third = @temp; } } } #############End of Get the MU process time ############# #############Write everything to the output file ############# open(OUTPUT, ">>$outputFile") ; for(my $i = 0; $i < $Total_number_of_requests; $i++){ my $su_average = sprintf("%.3f", (($su_first[$i] + $su_second[ +$i] + $su_third[$i])/$su_files)); my $mu_average = sprintf("%.3f",(($mu_first[$i] + $mu_second[$ +i] + $mu_third[$i])/$mu_files)); print(OUTPUT "$xml_names[$i]\t$su_first[$i]\t$su_second[$i]\t$ +su_third[$i]\t$su_average\t$mu_first[$i]\t$mu_second[$i]\t$mu_third[$ +i]\t$mu_average \n"); } print LOG "All the values from performance logs are available in +$outputFile \n "; close OUTPUT; #############End of Write everything to the output file ########## +### }

Replies are listed 'Best First'.
Re: Parsing many files and output to one file. Pls HELP
by GrandFather (Saint) on Nov 11, 2008 at 07:48 UTC

    Addressing just the point at issue and ignoring the vast majority of your "sample" code, the following demonstrates how you can handle an arbitrary number of columns (equivalent of your multiple input files perhaps) over an arbitrary number of rows by using an array of array:

    use strict; use warnings; my @rows; while (<DATA>) { chomp; next unless length; my @columns = split ' '; push @rows, \@columns; } printf " %2d ", $_ for 1 .. @{$rows[0]}; print " sum\n"; for my $row (@rows) { my $sum; $sum += $_ for @$row; printf " %3d", $_ for @$row; printf " %4d\n", $sum; } __DATA__ 218 156 350 994 137 729 656 977 582 957 80 10 686 679 881 486 272 927 971 153 136 694 724 326 536 99 620 564 290 703 402 835 397 291 886 714 580 861 80 634 651 144 787 722 125 397 323 261 969 782

    Prints:

    1 2 3 4 5 6 7 8 9 10 sum 218 156 350 994 137 729 656 977 582 957 5756 80 10 686 679 881 486 272 927 971 153 5145 136 694 724 326 536 99 620 564 290 703 4692 402 835 397 291 886 714 580 861 80 634 5680 651 144 787 722 125 397 323 261 969 782 5161

    Perl reduces RSI - it saves typing
Re: Parsing many files and output to one file. Pls HELP
by moritz (Cardinal) on Nov 11, 2008 at 07:21 UTC
    while (<INPUT>) { chomp($_); # Look for title line if ($_ =~/Start -/) { my @sdk_line = split(/\t/, $_); push(@temp, substr($sdk_line[1], 18)); } } while (<INPUT>) { chomp($_); # find process time for that title if ($_ =~/Process/) { print split(/\t/, $_), "\n"; my @sdk_line = split(/\t/, $_); push(@temp , $sdk_line[2]); } }

    The second while loop will never be executed, because the first one exhausts all line in the INPUT file.

    Also please indent your code in a way that makes it obvious to see which loop ends where.

    I have hardcoded my script to parse only 3 files. How do I do it for multiple/unknown number of files.

    By iterating over the file names, open each, process it, and close it again. Something like this:

    for my $fn (@file_names){ open my $handle, '<', $fn or die "Can't open `$fn' for reading: $! +"; # do something with the file here # and write to output file close $fn; }
      Thanks. Those 2 while loops are from difference subs. I have done parsing of 6 files (parse to get the xml names & the Process time) and have it array. My program works fine for 6 files. Now If I have more than 6 files (n number of files..Unknown number) How do i dynamically name arrays say $myArray$i ? I can not read one file at a time and output it. The output file should be printed column wise. Each column in the output file will be from each file.
        Instead of naming them $myArray1, $myArray2 store them all in a common array. How that works is described in perllol and perlreftut, as well as in every good Perl book.
Re: Parsing many files and output to one file. Pls HELP
by Cristoforo (Curate) on Nov 11, 2008 at 21:17 UTC
    I think the problem might be solved by the code below for any number of su and mu files. It doesn't check if every file has the same number of requests, (would need to add some test). Also, the glob function assumes all the files are in the current working directory. If that's not the case, then a path would need to be prepended to the glob expression. I used Sort::Naturally to sort the respective files saved in the arrays and List::Util for the sum function.

    Chris

    #!/usr/bin/perl use strict; use warnings; use Sort::Naturally; use List::Util 'sum'; use List::Compare; my %data; my %file_requests; my $base; my @su_files = nsort glob "su[1-9]*_sdkperfmonlog.txt"; my @mu_files = nsort glob "mu[1-9]*_sdkperfmonlog.txt"; for my $file (@su_files, @mu_files) { my $request; open my $fh, "<", $file or die "Unable to open $file for reading. +$!"; while (<$fh>) { if (/Start\s+-+\s+\d+:\d+:(.+)/) { $request = $1; } elsif (/ProcessRequest\t([.\d]+)/) { $data{$request}{$file} = $1; push @{ $file_requests{$file} }, $request; } } close $fh or die "Unable to close $file. $!"; # validate $base ||= $file_requests{$file}; my $lc = List::Compare->new( $base, $file_requests{$file} ); die "Non-matching requests from $file. $!" unless $lc->is_LequivalentR; } # Header print join("\t", 'Request Name', (map {/(su\d+)_/} @su_files), 'Su-Avg', (map {/(mu\d+)_/} @mu_files), 'Mu-Avg' ), "\n"; # Body for my $request (keys %data) { print $request, "\t"; my @vals = @{ $data{$request} }{ @su_files }; print join("\t", @vals, sprintf "%0.3f", sum(@vals)/@vals), "\t"; @vals = @{ $data{$request} }{ @mu_files }; print join("\t", @vals, sprintf "%0.3f", sum(@vals)/@vals),"\n"; }

    Update: Added validation code

      Thanks a lot! I understood how u have used HASH & MAP to get this. MAP was the one I didnt think of using at all. I have added some more validations and regex to get the functionality I need. Thanks a lot!
Re: Parsing many files and output to one file. Pls HELP
by sanku (Beadle) on Nov 12, 2008 at 06:52 UTC
    hi, Try out this one if you need any other thing just send me a message
    use strict; use File::Basename; my ($i,@requestname,@values,@requestname1,%ss,$value,@value1,$vv); for my $path ( grep -f, </var/www/perlmonktxtfile/*> ) { open(FILE,"$path") or die $!; while(<FILE>){ if($_ =~/xml$/ && $_=~/Start ------/) { my ($vs1,$vs2,$vs3,$vs4,$vs5)=split(/:/,$_); push(@requestname,"$vs5"); } if($_ =~/ProcessRequest/){ my ($v1,$v2,$v3,$v4,$v5,$v6,$v7)=split(/\s+/,$_); push(@values,"$v4\t"); } } push(@values,"MM"); close (FILE); } push(@requestname1,grep {!$ss{$_}++} @requestname); $value=join('',@values); push(@value1,split(/MM/,$value)); foreach $vv(0 .. scalar @requestname1){ $requestname1[$vv]=~s/\s+$//g; print $requestname1[$vv]; print "\t$value1[$vv]\n"; }