in reply to Re^2: Sorting by value??
in thread Sorting by value??

Then you ought to have referenced that in the original post so I would know about it. Reviewing that thread, toolic gives some useful advice in Re^3: Sorting and Arranging data from input file. Your intended logic is still unclear to me, but I will give what guidance I can.

I would point out you should never use $a or $b as ordinary variables. At best they are confusing both due to ambiguity in name and conflict in namespace and at worst you'll clobber a sort.

Given that what you want to do is collect results by your first field, I would suggest using a hash as your base structure. Since you want to keep more than one piece of data per key, a hash of hashes would be helpful (usually called a HoH, see perlreftut, perldsc and/or perllol). In that way, the original order is irrelevant. I would then sort the keys using a caseless alphabetical sort, as you did in Re^2: Sorting and Arranging data from input file, before outputting. Swapping your filehandles to indirect ones, using 3-argument open and testing your opens, your code might look something like:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $inFile = "input file"; my $outFile = "output file"; # By putting an "or die" clause after an open, the code will exit grac +efully # with an emitted error message if a file cannot be opened for the int +ended # operation open my $fin, "<", $inFile or die "Open fail on $inFile: $!"; open my $fout, ">", $outFile or die "Open fail on $outFile: $!"; my %results; print $fout "|CSED Form|OrderNumber|Date|Total Documents|Total Pages|\ +n"; # Process the input file 1 line at a time, storing the results in a ha +sh of # hashes in %results while (my $line = <$fin>) { # Implicit tests if $line is defined my @fields = split /\|/, $line; # Split $line on | my $key = lc($fields[1]); # Lowercase here so results get associat +ed # $results{$key} is a hash reference, containing information on al +l records # keyed on $key (m737, q569, m729...). After processing, the hash + reference # in $results{$key} will have data keyed on 'lines', 'sum1' and 's +um2' push @{$results{$key}{lines}}, $line; # $results{$key}{lines} is an array reference. As each line is en +countered, # it is added to the end of the array. This code uses autovivific +ation - # this means that since $results{$key}{lines} is undefined when pe +rl first # encounters it AND it is treated as an array reference when perl +first # encounters it, it will be set to an empty array reference. It i +s # roughly equivalent to being preceded by the line # $results{$key}{lines} = [] if not defined $results{$key}{lin +es}; $results{$key}{sum1} += $fields[-3]; # Third to last column # Accumulate the number in $fields[-3] in $results{$key}{sum1} $results{$key}{sum2} += $fields[-2]; # Second to last column # Accumulate the number in $fields[-2] in $results{$key}{sum2} } # Use a second loop to sort and output the accumulated results. It tr +ansits all # keys in alphabetical order since there is no sorting code block for my $key (sort keys %results) { # prints the unchanged lines to the file (print in list context) a +t $fout # In particular, note all lines still contain newline characters a +t the ends print $fout @{$results{$key}{lines}}; my $count = @{$results{$key}{lines}}; # List in scalar context yi +elds length # Print accumulated results, as stored in the variables $count, $k +ey, # $results{$key}{sum1}, $results{$key}{sum2} print $fout "There were $count ${key}s and total of 2nd to last ro +w was $results{$key}{sum1} and total of last row was $results{$key}{s +um2}\n"; } # Handles close automatically when they go out of scope # Output to terminal a dump of %results so operator can see internal s +tructure print Dumper \%results;

Terminal output:

$VAR1 = { 'q569' => { 'sum1' => 1989, 'sum2' => 18248, 'lines' => [ '|Q569|5555|Jun 05,2010|584|4562| ', '|Q569|5555|Jun 05,2010|345|4562| ', '|Q569|5555|Jun 05,2010|215|4562| ', '|Q569|5555|Jun 05,2010|845|4562| + ' ] }, 'm737' => { 'sum1' => 876, 'sum2' => 9124, 'lines' => [ '|M737|5555|Jun 05,2010|753|4562| ', '|M737|5555|Jun 05,2010|123|4562| ' ] }, 'm729' => { 'sum1' => 2780, 'sum2' => 22810, 'lines' => [ '|M729|5652|Jun 11,2010|198|4562| ', '|M729|5876|Jun 15,2010|298|4562| ', '|M729|5726|Jun 18,2010|428|4562| ', '|M729|5147|Jun 20,2010|918|4562| ', '|M729|5632|Jun 01,2010|938|4562| ' ] } };

input file:

|M737|5555|Jun 05,2010|753|4562| |M737|5555|Jun 05,2010|123|4562| |Q569|5555|Jun 05,2010|584|4562| |Q569|5555|Jun 05,2010|345|4562| |Q569|5555|Jun 05,2010|215|4562| |Q569|5555|Jun 05,2010|845|4562| |M729|5652|Jun 11,2010|198|4562| |M729|5876|Jun 15,2010|298|4562| |M729|5726|Jun 18,2010|428|4562| |M729|5147|Jun 20,2010|918|4562| |M729|5632|Jun 01,2010|938|4562|

output file:

|CSED Form|OrderNumber|Date|Total Documents|Total Pages| |M729|5652|Jun 11,2010|198|4562| |M729|5876|Jun 15,2010|298|4562| |M729|5726|Jun 18,2010|428|4562| |M729|5147|Jun 20,2010|918|4562| |M729|5632|Jun 01,2010|938|4562| There were 5 m729s and total of 2nd to last row was 2780 and total of +last row was 22810 |M737|5555|Jun 05,2010|753|4562| |M737|5555|Jun 05,2010|123|4562| There were 2 m737s and total of 2nd to last row was 876 and total of l +ast row was 9124 |Q569|5555|Jun 05,2010|584|4562| |Q569|5555|Jun 05,2010|345|4562| |Q569|5555|Jun 05,2010|215|4562| |Q569|5555|Jun 05,2010|845|4562| There were 4 q569s and total of 2nd to last row was 1989 and total of +last row was 18248

Update: At the OP's request via msg, I added significant in-line documentation for the above code as well as the trailing Dumper code.

Replies are listed 'Best First'.
Re^4: Sorting by value??
by Homerhrdz (Initiate) on Jun 15, 2010 at 22:06 UTC
    Ok heres what i have so far
    #!/usr/bin/perl my $inFile = "Report.txt"; my $outFile = "SortedReport.txt"; open IN, "< $inFile"; open OUT, ">> $outFile"; my @not_sorted = <IN>; print OUT "|CSED Form|OrderNumber|Date|Total Documents|Total Pages|\n +"; @sorted = sort { lc($a) cmp lc($b) } @not_sorted; # alphabetic +al sort my $count = 1; foreach(@sorted) { $frmType = substr $_, 1, 3; if ($frmType = "M729") { print OUT "$_"; $count ++; } else { print OUT "There were $count Monthly 729 runs\n"; } } foreach(@sorted) { $frmType = substr $_, 1, 3; if ($frmType = "M737") { print OUT "$_"; $count ++; } else { print OUT "there were $count Monthly 737 runs\n"; } } $count = 0; foreach(@sorted) { $frmType = substr $_, 1, 3; if ($frmType = "Q569") { print OUT "$_"; $count ++; } else { print OUT "there were $count Quarterly 569 runs\n"; } } close OUT; close IN;

      in your code : if ($frmType = "M729") And the other "if"'s like it will always be true because you are assigning not comparing.

      And you cannot check strings using numerical test '==' , You'll have to use 'eq' instead. e.g: if ($frmType eq "M729")

      Once you fix the problem that was explained by ahmad above, you will discover the next problem with your code: now all the "if" conditions will come out false every time, because...
      ... $frmType = substr $_, 1, 3; if ($frmType eq "M729") { ...
      The "substr()" call is assigning a 3-character string to $frmtype, and you are testing that to see if it's identical to a 4-character string ("M729" in this case). That makes no sense.

      Also, proper indentation would help a lot. Please give that a try.

      First, apply the suggestions from toolic, ahmed and graff. In particular, please fix your indentation - as toolic said, perltidy will do that for you.

      In addition:

      1. You initialize $count to 1 to start, which will give you an off-by-one error. You mean 0.
      2. You do not reinitialize $count before your M737 loop. That means this will count both M729 and M737.
      3. Since your print statement is inside your for loop, you will print your results string many times - 6, 9, and 7 times respectively for the provided input. In addition, your output will be wrong every time for Q569 as all your prints will happen before you process any of your Q569s - it will print 0 each time.
      Post code again if you have all implemented these changes and your code still doesn't function as expected.
        Once again I appreciate your guys' help a lot I'm still fairly new to PERL ok here is the code with the changes made and also another bit of code I have Added

        #!/usr/bin/perl use warnings; use strict; my $inFile = "UnSortReport.txt"; my $outFile = "SortedReport.txt"; open IN, "< $inFile"; open OUT, ">> $outFile"; my @not_sorted = <IN>; print OUT "|CSED Form|OrderNumber|Date|Total Documents|Total Pages +|\n"; @sorted = sort { lc($a) cmp lc($b) } @not_sorted; # alphabetic +al sort my $count = 0; foreach(@sorted) { $frmType = substr $_, 1, 4; if ($frmType eq "M729") { print OUT "$_"; $count ++; } else { print OUT "There were $count Monthly 729 runs\n"; } } $count = 0; foreach(@sorted) { $frmType = substr $_, 1, 4; if ($frmType eq "M737") { print OUT "$_"; $count ++; } else { print OUT "There were $count Monthly 737 runs\n"; } } $count = 0; foreach(@sorted) { $frmType = substr $_, 1, 4; if ($frmType eq "Q569") { print OUT "$_"; $count ++; } else { print OUT "There were $count Quarterly 569 runs\n"; } } close OUT; close IN; ##############The code Bellow looks a the SortedReport.txt and checks + for duplicate lines and removes them if any######################### +### my $file = 'SortedReport.txt'; my %seen = (); { local @ARGV = ($file); local $^I = '.bac'; while(<>) { $seen{$_}++; next if $seen{$_} > 1; print; } } print "finished processing file.\n";
Re^4: Sorting by value??
by Homerhrdz (Initiate) on Jun 16, 2010 at 16:46 UTC
    Hey Thanks i did not see this before that worked wonderfully I did not know you can go that route with the code Thanks