I am trying to match a set of small sequences (~ 7000 in number) to another large set of sequences stored in hash (size is ~100GB). As this is a huge task, I thought of using multi-threading and search 40 sequences at a time. Although the script is able to create threads, I faced problem in RAM usage when 40 threads are generated. It seems like each thread is taking a copy of hash (and each hash is ~100 GB), due to this the program get killed in between. I am using a system with 512 GB of RAM and 88 threads.

Is there any way of doing this in memory efficient way.

Thank you,
Santosh

#!/usr/bin/perl use strict; use threads; my $num_of_threads = 40; my @peptides=(); ## Store the peptides to search in array open(IN,"peptides.txt") or die "Could not open the file:$!\n"; while(<IN>) { chomp; $_=~s/\r//g; push(@peptides,$_); } close IN; my %hashNR=(); ## Store the Sequence in this hash my %hashRes=(); ## Store the matched results my $nrid=""; open(REF,"NR.fasta") or die "Could not open the file:$!\n"; while(<REF>) { chomp; $_=~s/\r//g; if(/^>/) { $nrid=(split /\s/)[0]; } else { $hashNR{$nrid}.="$_"; } } close REF; my @allIDS=(keys %hashNR); my $L = scalar(@allIDS); print "Reference Reading Completed\n"; my $j= 0; while($j < scalar(@peptides)) { my @threads = initThreads(); foreach(@threads) { my $pep = $peptides[$j]; $_ = threads -> create(\&doOperation,$pep,$L); $j++; } foreach(@threads) { $_ -> join(); } } open(OUT,">Outfile.txt") or die "Could not create the file:$!\n"; foreach my $k (keys %hashRes) { print "$k\t$hashRes{$k}\n"; } close OUT; ############################################### ## Subroutine for initializing the Thread array sub initThreads { my @initThreads; for(my $i=1;$i<=$num_of_threads;$i++) { push(@initThreads,$i); } return @initThreads; } ## Task run by each threads sub doOperation { my @allp = @_; my $id = threads -> tid(); for (my $i=0; $i<$allp[1]; $i++) { if($hashNR{$allIDS[$i]}=~/$allp[0]/) { $hashRes{$allp[0]}.=",$allIDS[$i]"; } } threads -> exit(); }

In reply to Problem in RAM usage while threading the program by beherasan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.