in reply to Re: write hash to disk after memory limit
in thread write hash to disk after memory limit

thanks a lot, I've been using hash-hash-array-array in order to keep memory use down. I think array access is also faster than hash, so I did this:
foreach my $rat (@directories) { print "Reading Merged_99$rat/bs_seeker-CG.tab ...\n"; open(FH,"<Merged_99$rat/bs_seeker-CG.tab") or die "cannot read M +erged_99$rat/bs_seeker-CG.tab: $!"; while (<FH>) { if (/M/) { next; } elsif ((/^chr(\S+)\s+(\d+)\s+\d+\s+(\d)\.(\d+)\s+(\d+)/) && + ($1 ~~ @CHROMOSOMES) && ($5 >= $MINIMUM_COVERAGE)) { #chromosome $1, methylated C $2, percent $3.$4 and coverage $5 $DATA{$1}{$2}[$set][$replicate] = "$3.$4"; } elsif ((/^chr(\S+)\s+(\d+)\s+\d+\s+(\d)\s+(\d+)/) && ($1 ~~ + @CHROMOSOMES) && ($4 >= $MINIMUM_COVERAGE)) { $DATA{$1}{$2}[$set][$replicate] = $3; } } close FH; $replicate++; }

Replies are listed 'Best First'.
Re^3: write hash to disk after memory limit
by LanX (Saint) on Mar 13, 2015 at 13:41 UTC
    As I said, better

    > > organize the upper tier roughly according to the timeline of your process

    No idea where $set comes from but $replicate could be such a top tier.

    so $data[$set][$replicate]{$1}{$2} should have far less memory swapping problems (AFAIS).

    (BTW better reserve uppercase var-names to perl buit-ins)

    If this structure doesn't fit into your future plans, you most likely want to use a DB anyway.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)

    PS: Je suis Charlie!

Re^3: write hash to disk after memory limit
by Jenda (Abbot) on Mar 14, 2015 at 00:59 UTC

    Do you later use the value as a string or as a number? If you use it as a number, I believe you could save quite a bit of memory by forcing a conversion before storing the data. The way you do it, you end up with a scalar containing both the string and (as soon as you use the number for the first time) the number.

    ... $DATA{$1}{$2}[$set][$replicate] = 0 + "$3.$4"; } elsif ((/^chr(\S+)\s+(\d+)\s+\d+\s+(\d)\s+(\d+)/) && ($1 ~~ + @CHROMOSOMES) && ($ +4 >= $MINIMUM_COVERAGE)) { $DATA{$1}{$2}[$set][$replicate] = 0 + $3; ...

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.