The following code produces identical results to choroba's code but uses less than 1/4 of the memory (180MB vs 795MB for my test dataset) and runs more quickly:
#! perl -slw use strict; use List::Util qw[ first ]; my @headers = split ' ', scalar <>; my $f = first { $headers[$_] eq 'Strand' } 0 .. $#headers; my( $cCounts, $wCounts, $n, %index ) = ( '', '', 0 ); while( <> ) { chomp; my @F = split ' '; my $index = $index{ $F[ $f+1 ] }{ $F[ $f + 2 ] } //= $n++; ++vec( $F[ $f ] eq 'w' ? $wCounts : $cCounts, $index, 8 ); } while( my( $key, $subhash ) = each %index ) { while( my( $subkey, $index ) = each %{ $subhash } ) { print join "\t", $key, $subkey, vec( $cCounts, $index, 8 ), ve +c( $wCounts, $index, 8 ); } } __END__ 1177246.pl 1177246.dat > 1177246.out
It assumes no count will be greater than 256. If that's too small, change the three 8s to 16s for a small increase in memory consumption.
In reply to Re^3: Memory usage while tallying instances of lines in a .txt file
by BrowserUk
in thread Memory usage while tallying instances of lines in a .txt file
by TJCooper
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |