Re^3: Memory usage while tallying instances of lines in a .txt file

The following code does what you want, ie. "Strand" can be at any position on the first line, and it removes the extreme memory overhead of reading in the whole file at once.

use warnings;
use strict;

use Data::Dumper;
use List::Util qw(first);

my %hits;
my $index;

open my $fh, '<', 'file.txt' or die $!;

while (<$fh>){
    chomp;
    my @F = split ' ';

    if (/Strand/){
        $index = first { $F[$_] eq 'Strand' } 0..$#F;
        next;
    }

    if (! exists $hits{$F[$index+1]}{$F[$index+2]}) {
        $hits{$F[$index+1]}{$F[$index+2]}{'w'} = 0;
        $hits{$F[$index+1]}{$F[$index+2]}{'c'} = 0;
    }
    $hits{$F[$index+1]}{$F[$index+2]}{$F[$index]}++;
}

print Dumper \%hits;
[download]

Data used:

Strand
1   4   1   0
1   5   1   0
1   31  1   0
1   74  1   0
[download]

Comment on Re^3: Memory usage while tallying instances of lines in a .txt file Select or Download Code

Replies are listed 'Best First'.
Re^4: Memory usage while tallying instances of lines in a .txt file by TJCooper (Beadle) on Dec 05, 2016 at 18:42 UTC
I'm not sure where I was reading the entire file into memory. Shouldn't these lines of code only handle the first line of the input file (given that they occur outside of the while-loop): `my @headers = split("\t",<$IN>); my $index = first{$headers[$_] eq 'Strand'} 0..$#headers;` [download] Indeed, your approach does not reduce RAM requirement.	[reply] [d/l]