Parsing many files and output to one file. Pls HELP

hiradhu has asked for the wisdom of the Perl Monks concerning the following question:

Hi PERL Monks, This is my second attempt..I have got good suggestions from a PERL MONK but I dont know to use it. Can u pls help me again?! Here is what I need is. I have multiple ( at Runtime I need to find out the number) files "su1_file.txt" etc.. with the following format.

10/22/2008 00:22:01    Start testing ---  File:  Open Mode: Single Use
+r, RPVersion: 2
10/22/2008 00:22:01    Start ------- 1:1:11002_CustomerAdd_WithNotes_R
+q.xml
10/22/2008 00:22:04    New QBXMLRP    0.328    65.03    0.00    0.00
10/22/2008 00:22:06    OpenConnection    1.031    65.03    0.00    0.0
+0
10/22/2008 00:22:08    BeginSession    0.453    65.03    0.00    0.00
10/22/2008 00:22:14    ProcessRequest    4.984    65.46    0.43    0.8
+3
10/22/2008 00:22:15    EndSession    0.016    65.46    0.00    0.00
10/22/2008 00:22:15    End --------- 1:1:11002_CustomerAdd_WithNotes_R
+q.xml
10/22/2008 00:22:15    Start ------- 1:1:11004_AccountAdd_1_Rq.xml
10/22/2008 00:22:16    BeginSession    0.031    65.46    0.00    0.00
10/22/2008 00:22:17    ProcessRequest    0.422    65.57    0.10    0.5
+9
10/22/2008 00:22:18    EndSession    0.000    65.57    0.00    0.00
10/22/2008 00:22:18    End --------- 1:1:11004_AccountAdd_1_Rq.xml
10/22/2008 00:22:18    Start ------- 1:1:11007_EmployeeAdd_Rq.xml
10/22/2008 00:22:20    BeginSession    0.094    65.57    0.00    0.00
10/22/2008 00:22:21    ProcessRequest    0.766    65.55    -0.02    0.
+88
10/22/2008 00:22:22    EndSession    0.016    65.55    0.00    0.00
10/22/2008 00:22:22    End --------- 1:1:11007_EmployeeAdd_Rq.xml
[download]

I have to parse thro the files and get only xml names and the first number next to Process Request for each xml. My code for that goes like this

while (<INPUT>)  {
        chomp($_);
        # Look for title line
        if ($_ =~/Start -/) {
            my @sdk_line = split(/\t/, $_);
            push(@temp, substr($sdk_line[1], 18));            
        }
    }
[download]

    while (<INPUT>)  
        {
            chomp($_);        
            # find process time for that title
            if ($_ =~/Process/) {
                print split(/\t/, $_), "\n";
                my @sdk_line = split(/\t/, $_);
                push(@temp , $sdk_line[2]);            
            
            }
        }
[download]

I have to write this to a OUTPUT file in this format. SU1, SU2 are values from files su1_file.txt, su2_file.txt. Which are tab seperated files. The output should also be tab seperated so that I can open in xls.

Request Name    SU1    SU2    SU3    SU-Avg        MU1    MU2    MU3  
+  MU-Avg
11002_CustomerAdd_WithNotes_Rq.xml    4.984    6.766    6.766    2.645
+    2.141    6.125    6.766    3.006 
11004_AccountAdd_1_Rq.xml    0.422    1.203    1.203    0.404    0.297
+    1.062    1.203    0.512 
11007_EmployeeAdd_Rq.xml    0.766    0.359    0.359    0.212    0.250 
+   0.281    0.359    0.178
[download]

I have hardcoded my script to parse only 3 files. How do I do it for multiple/unknown number of files... This is my code.

## Get the XML names ##
sub GetXMLNames
{
    open(INPUT, $inputPath);
    my @temp_xml = ();
    
    while (<INPUT>)  {
        chomp($_);
        # Look for title line
        if ($_ =~/Start -/) {
            my @sdk_line = split(/\t/, $_);
            push(@temp, substr($sdk_line[1], 18));            
        }
    }
    
    close INPUT;
    return @temp;    
        
}

## Get the Process time ##
sub GetOnlyProcessTime{    
        open(LOG,">>$parseSDKPerfResultsLog");
        open(INPUT, $inputPath);        
        my @temp =();
        while (<INPUT>)  
        {
            chomp($_);        
            # find process time for that title
            if ($_ =~/Process/) {
                 my @sdk_line = split(/\t/, $_);
                push(@temp , $sdk_line[2]);            
            
            }
        }
        
    my $number = scalar(@temp);
        
    if($number < $Total_number_of_requests){
        my $missing = $Total_number_of_requests - $number;
        print LOG "$missing request(s) in $inputPath have not got exec
+uted. Hence exiting.\n";
        close LOG;
        close INPUT;
        exit;
    }
    
    close INPUT;
    return @temp;
    
}

## Parse all the logs ## 
sub parseSDKPerfmonLogs{    
    
    open(LOG,">>$parseSDKPerfResultsLog");
    
    for( my $su=1; $su <= $su_files; $su++){
        $su_inputfile = "su".$su."_sdkperfmonlog.txt";
        push(@su_inputfile,$su_inputfile);            
    }
    
    for( my $mu=1; $mu <= $mu_files; $mu++){
        $mu_inputfile = "mu".$mu."_sdkperfmonlog.txt";
        push(@mu_inputfile,$mu_inputfile);            
    }
    
        
    #############Get the Request file names in the first column ######
+#######
    $inputPath=$ResFolderPath."\\su".$su_files."_sdkperfmonlog.txt";
    @xml_names = GetXMLNames($inputPath);
    $Total_number_of_requests = @xml_names;    
    ############# end of Get the request names in results file #######
+###################
    
    
    #############Get the SU process time #############
    foreach $sufile (@su_inputfile){
        
        #print $sufile;
        $inputPath=$ResFolderPath."\\".$sufile;
        $filenumber = substr($sufile,2,1);
        @temp = GetOnlyProcessTime($inputPath);    
        
        if($filenumber <= $su_files){                    
            if( $filenumber == 1){@su_first = @temp;  }
            elsif($filenumber == 2){@su_second = @temp; }
            elsif($filenumber == 3){@su_third = @temp; }
        }
                
    }
    #############End of Get the SU process time #############
    
    
    #############Get the MU process time #############
    foreach $mufile (@mu_inputfile){
        
        #print $sufile;
        $inputPath=$ResFolderPath."\\".$mufile;
        $filenumber = substr($mufile,2,1);
        
        @temp = GetOnlyProcessTime($inputPath);    
        
        if($filenumber <= $mu_files){                    
                if( $filenumber == 1){@mu_first = @temp;  }
            elsif($filenumber == 2){@mu_second = @temp; }
            elsif($filenumber == 3){@mu_third = @temp; }
        }
                
    }
    #############End of Get the MU process time #############
    
    
    #############Write everything to the output file #############
    open(OUTPUT, ">>$outputFile") ;
    for(my $i = 0; $i < $Total_number_of_requests; $i++){
        
        my $su_average = sprintf("%.3f", (($su_first[$i] + $su_second[
+$i] + $su_third[$i])/$su_files));
        my $mu_average = sprintf("%.3f",(($mu_first[$i] + $mu_second[$
+i] + $mu_third[$i])/$mu_files));
        
        print(OUTPUT "$xml_names[$i]\t$su_first[$i]\t$su_second[$i]\t$
+su_third[$i]\t$su_average\t$mu_first[$i]\t$mu_second[$i]\t$mu_third[$
+i]\t$mu_average \n");            
    }
    print  LOG "All the values from performance logs are available in 
+$outputFile \n "; 
    close OUTPUT;
    #############End of Write everything to the output file ##########
+###
        
}
[download]

Comment on Parsing many files and output to one file. Pls HELP Select or Download Code

Replies are listed 'Best First'.
Re: Parsing many files and output to one file. Pls HELP by GrandFather (Saint) on Nov 11, 2008 at 07:48 UTC
Addressing just the point at issue and ignoring the vast majority of your "sample" code, the following demonstrates how you can handle an arbitrary number of columns (equivalent of your multiple input files perhaps) over an arbitrary number of rows by using an array of array: `use strict; use warnings; my @rows; while (<DATA>) { chomp; next unless length; my @columns = split ' '; push @rows, \@columns; } printf " %2d ", $_ for 1 .. @{$rows[0]}; print " sum\n"; for my $row (@rows) { my $sum; $sum += $_ for @$row; printf " %3d", $_ for @$row; printf " %4d\n", $sum; } __DATA__ 218 156 350 994 137 729 656 977 582 957 80 10 686 679 881 486 272 927 971 153 136 694 724 326 536 99 620 564 290 703 402 835 397 291 886 714 580 861 80 634 651 144 787 722 125 397 323 261 969 782` [download] Prints: `1 2 3 4 5 6 7 8 9 10 sum 218 156 350 994 137 729 656 977 582 957 5756 80 10 686 679 881 486 272 927 971 153 5145 136 694 724 326 536 99 620 564 290 703 4692 402 835 397 291 886 714 580 861 80 634 5680 651 144 787 722 125 397 323 261 969 782 5161` [download] Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re: Parsing many files and output to one file. Pls HELP by moritz (Cardinal) on Nov 11, 2008 at 07:21 UTC
`while (<INPUT>) { chomp($_); # Look for title line if ($_ =~/Start -/) { my @sdk_line = split(/\t/, $_); push(@temp, substr($sdk_line[1], 18)); } } while (<INPUT>) { chomp($_); # find process time for that title if ($_ =~/Process/) { print split(/\t/, $_), "\n"; my @sdk_line = split(/\t/, $_); push(@temp , $sdk_line[2]); } }` [download] The second while loop will never be executed, because the first one exhausts all line in the `INPUT` file. Also please indent your code in a way that makes it obvious to see which loop ends where. I have hardcoded my script to parse only 3 files. How do I do it for multiple/unknown number of files. By iterating over the file names, open each, process it, and close it again. Something like this: for my $fn (@file_names){ open my $handle, '<', $fn or die "Can't open `$fn' for reading: $! +"; # do something with the file here # and write to output file close $fn; } [download]	[reply] [d/l] [select]
Re^2: Parsing many files and output to one file. Pls HELP by hiradhu (Acolyte) on Nov 11, 2008 at 08:55 UTC
Thanks. Those 2 while loops are from difference subs. I have done parsing of 6 files (parse to get the xml names & the Process time) and have it array. My program works fine for 6 files. Now If I have more than 6 files (n number of files..Unknown number) How do i dynamically name arrays say $myArray$i ? I can not read one file at a time and output it. The output file should be printed column wise. Each column in the output file will be from each file.	[reply]
Re^3: Parsing many files and output to one file. Pls HELP by moritz (Cardinal) on Nov 11, 2008 at 09:14 UTC
Instead of naming them `$myArray1, $myArray2` store them all in a common array. How that works is described in perllol and perlreftut, as well as in every good Perl book.	[reply] [d/l]
Re: Parsing many files and output to one file. Pls HELP by Cristoforo (Curate) on Nov 11, 2008 at 21:17 UTC
I think the problem might be solved by the code below for any number of su and mu files. It doesn't check if every file has the same number of requests, (would need to add some test). Also, the glob function assumes all the files are in the current working directory. If that's not the case, then a path would need to be prepended to the glob expression. I used Sort::Naturally to sort the respective files saved in the arrays and List::Util for the sum function. Chris #!/usr/bin/perl use strict; use warnings; use Sort::Naturally; use List::Util 'sum'; use List::Compare; my %data; my %file_requests; my $base; my @su_files = nsort glob "su[1-9]_sdkperfmonlog.txt"; my @mu_files = nsort glob "mu[1-9]_sdkperfmonlog.txt"; for my $file (@su_files, @mu_files) { my $request; open my $fh, "<", $file or die "Unable to open $file for reading. +$!"; while (<$fh>) { if (/Start\s+-+\s+\d+:\d+:(.+)/) { $request = $1; } elsif (/ProcessRequest\t([.\d]+)/) { $data{$request}{$file} = $1; push @{ $file_requests{$file} }, $request; } } close $fh or die "Unable to close $file. $!"; # validate $base \|\|= $file_requests{$file}; my $lc = List::Compare->new( $base, $file_requests{$file} ); die "Non-matching requests from $file. $!" unless $lc->is_LequivalentR; } # Header print join("\t", 'Request Name', (map {/(su\d+)_/} @su_files), 'Su-Avg', (map {/(mu\d+)_/} @mu_files), 'Mu-Avg' ), "\n"; # Body for my $request (keys %data) { print $request, "\t"; my @vals = @{ $data{$request} }{ @su_files }; print join("\t", @vals, sprintf "%0.3f", sum(@vals)/@vals), "\t"; @vals = @{ $data{$request} }{ @mu_files }; print join("\t", @vals, sprintf "%0.3f", sum(@vals)/@vals),"\n"; } [download] Update: Added validation code	[reply] [d/l]
Re^2: Parsing many files and output to one file. Pls HELP by hiradhu (Acolyte) on Nov 13, 2008 at 09:22 UTC
Thanks a lot! I understood how u have used HASH & MAP to get this. MAP was the one I didnt think of using at all. I have added some more validations and regex to get the functionality I need. Thanks a lot!	[reply]
Re: Parsing many files and output to one file. Pls HELP by sanku (Beadle) on Nov 12, 2008 at 06:52 UTC
hi, Try out this one if you need any other thing just send me a message use strict; use File::Basename; my ($i,@requestname,@values,@requestname1,%ss,$value,@value1,$vv); for my $path ( grep -f, </var/www/perlmonktxtfile/*> ) { open(FILE,"$path") or die $!; while(<FILE>){ if($_ =~/xml$/ && $_=~/Start ------/) { my ($vs1,$vs2,$vs3,$vs4,$vs5)=split(/:/,$_); push(@requestname,"$vs5"); } if($_ =~/ProcessRequest/){ my ($v1,$v2,$v3,$v4,$v5,$v6,$v7)=split(/\s+/,$_); push(@values,"$v4\t"); } } push(@values,"MM"); close (FILE); } push(@requestname1,grep {!$ss{$_}++} @requestname); $value=join('',@values); push(@value1,split(/MM/,$value)); foreach $vv(0 .. scalar @requestname1){ $requestname1[$vv]=~s/\s+$//g; print $requestname1[$vv]; print "\t$value1[$vv]\n"; } [download]	[reply] [d/l]