Given input data in the form of:

c 8 336158 75 75M 74 c 12 828707 74 74M 73 w 10 528559 74 74M 0 c 15 267766 74 74M 73 c 12 828707 74 74M 73 c 14 491797 74 74M 73

I am trying to tally the instances of records based on columns 1 (which has the header 'Strand' - this can be variable in position hence the use of List::Util qw(first)) as well as columns 2 and 3. The main chunk of code that accomplishes this is simply:

my @headers = split("\t",<$IN>); my $index = first{$headers[$_] eq 'Strand'} 0..$#headers; while (<$IN>) { chomp $_; my @F = split("\t", $_); if (exists $hits{$F[$index+1]}{$F[$index+2]}) { } else { $hits{$F[$index+1]}{$F[$index+2]}{'w'} = 0; $hits{$F[$index+1]}{$F[$index+2]}{'c'} = 0; } $hits{$F[$index+1]}{$F[$index+2]}{$F[$index]}++ }
This is then printed in a simple manner to form files like these:
1 4 1 0 1 5 1 0 1 31 1 0 1 74 1 0 1 89 1 0 1 116 1 1 1 118 1 0 1 122 1 0 1 126 0 1 1 140 0 1 1 141 0 1 1 148 2 0 1 158 0 1 1 159 1 0

Column 2 and 3, along with the frequency counts of each for W and C.

This approach however requires a rather a lot of memory - around 800MB for an input file of ~100Mb.

Are there any clever tricks or alternative methods that I could use in order to reduce the memory requirements? I note that for any given column 2-column 3 combination, a key and a blank (zeroed) value is stored the first time it is encountered - this is done as the output file is required in the format shown above where '0' is filled in. This may be increasing memory usage further when the zeros could be added afterward (perhaps during printing), but i'm entirely sure or how I would do this.


In reply to Memory usage while tallying instances of lines in a .txt file by TJCooper

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.