Then you ought to have referenced that in the original post so I would know about it. Reviewing that thread, toolic gives some useful advice in Re^3: Sorting and Arranging data from input file. Your intended logic is still unclear to me, but I will give what guidance I can.
I would point out you should never use $a or $b as ordinary variables. At best they are confusing both due to ambiguity in name and conflict in namespace and at worst you'll clobber a sort.
Given that what you want to do is collect results by your first field, I would suggest using a hash as your base structure. Since you want to keep more than one piece of data per key, a hash of hashes would be helpful (usually called a HoH, see perlreftut, perldsc and/or perllol). In that way, the original order is irrelevant. I would then sort the keys using a caseless alphabetical sort, as you did in Re^2: Sorting and Arranging data from input file, before outputting. Swapping your filehandles to indirect ones, using 3-argument open and testing your opens, your code might look something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $inFile = "input file";
my $outFile = "output file";
# By putting an "or die" clause after an open, the code will exit grac
+efully
# with an emitted error message if a file cannot be opened for the int
+ended
# operation
open my $fin, "<", $inFile or die "Open fail on $inFile: $!";
open my $fout, ">", $outFile or die "Open fail on $outFile: $!";
my %results;
print $fout "|CSED Form|OrderNumber|Date|Total Documents|Total Pages|\
+n";
# Process the input file 1 line at a time, storing the results in a ha
+sh of
# hashes in %results
while (my $line = <$fin>) { # Implicit tests if $line is defined
my @fields = split /\|/, $line; # Split $line on |
my $key = lc($fields[1]); # Lowercase here so results get associat
+ed
# $results{$key} is a hash reference, containing information on al
+l records
# keyed on $key (m737, q569, m729...). After processing, the hash
+ reference
# in $results{$key} will have data keyed on 'lines', 'sum1' and 's
+um2'
push @{$results{$key}{lines}}, $line;
# $results{$key}{lines} is an array reference. As each line is en
+countered,
# it is added to the end of the array. This code uses autovivific
+ation -
# this means that since $results{$key}{lines} is undefined when pe
+rl first
# encounters it AND it is treated as an array reference when perl
+first
# encounters it, it will be set to an empty array reference. It i
+s
# roughly equivalent to being preceded by the line
# $results{$key}{lines} = [] if not defined $results{$key}{lin
+es};
$results{$key}{sum1} += $fields[-3]; # Third to last column
# Accumulate the number in $fields[-3] in $results{$key}{sum1}
$results{$key}{sum2} += $fields[-2]; # Second to last column
# Accumulate the number in $fields[-2] in $results{$key}{sum2}
}
# Use a second loop to sort and output the accumulated results. It tr
+ansits all
# keys in alphabetical order since there is no sorting code block
for my $key (sort keys %results) {
# prints the unchanged lines to the file (print in list context) a
+t $fout
# In particular, note all lines still contain newline characters a
+t the ends
print $fout @{$results{$key}{lines}};
my $count = @{$results{$key}{lines}}; # List in scalar context yi
+elds length
# Print accumulated results, as stored in the variables $count, $k
+ey,
# $results{$key}{sum1}, $results{$key}{sum2}
print $fout "There were $count ${key}s and total of 2nd to last ro
+w was $results{$key}{sum1} and total of last row was $results{$key}{s
+um2}\n";
}
# Handles close automatically when they go out of scope
# Output to terminal a dump of %results so operator can see internal s
+tructure
print Dumper \%results;
Terminal output:
$VAR1 = {
'q569' => {
'sum1' => 1989,
'sum2' => 18248,
'lines' => [
'|Q569|5555|Jun 05,2010|584|4562|
',
'|Q569|5555|Jun 05,2010|345|4562|
',
'|Q569|5555|Jun 05,2010|215|4562|
',
'|Q569|5555|Jun 05,2010|845|4562|
+
'
]
},
'm737' => {
'sum1' => 876,
'sum2' => 9124,
'lines' => [
'|M737|5555|Jun 05,2010|753|4562|
',
'|M737|5555|Jun 05,2010|123|4562|
'
]
},
'm729' => {
'sum1' => 2780,
'sum2' => 22810,
'lines' => [
'|M729|5652|Jun 11,2010|198|4562|
',
'|M729|5876|Jun 15,2010|298|4562|
',
'|M729|5726|Jun 18,2010|428|4562|
',
'|M729|5147|Jun 20,2010|918|4562|
',
'|M729|5632|Jun 01,2010|938|4562|
'
]
}
};
input file:
|M737|5555|Jun 05,2010|753|4562|
|M737|5555|Jun 05,2010|123|4562|
|Q569|5555|Jun 05,2010|584|4562|
|Q569|5555|Jun 05,2010|345|4562|
|Q569|5555|Jun 05,2010|215|4562|
|Q569|5555|Jun 05,2010|845|4562|
|M729|5652|Jun 11,2010|198|4562|
|M729|5876|Jun 15,2010|298|4562|
|M729|5726|Jun 18,2010|428|4562|
|M729|5147|Jun 20,2010|918|4562|
|M729|5632|Jun 01,2010|938|4562|
output file:
|CSED Form|OrderNumber|Date|Total Documents|Total Pages|
|M729|5652|Jun 11,2010|198|4562|
|M729|5876|Jun 15,2010|298|4562|
|M729|5726|Jun 18,2010|428|4562|
|M729|5147|Jun 20,2010|918|4562|
|M729|5632|Jun 01,2010|938|4562|
There were 5 m729s and total of 2nd to last row was 2780 and total of
+last row was 22810
|M737|5555|Jun 05,2010|753|4562|
|M737|5555|Jun 05,2010|123|4562|
There were 2 m737s and total of 2nd to last row was 876 and total of l
+ast row was 9124
|Q569|5555|Jun 05,2010|584|4562|
|Q569|5555|Jun 05,2010|345|4562|
|Q569|5555|Jun 05,2010|215|4562|
|Q569|5555|Jun 05,2010|845|4562|
There were 4 q569s and total of 2nd to last row was 1989 and total of
+last row was 18248
Update: At the OP's request via msg, I added significant in-line documentation for the above code as well as the trailing Dumper code.
|