Hm. Should've asked that question earlier:) It's been so long since I used a system without admin privileges that I forgot that some places (needlessly) restrict 'users' from even the most elementary of debugging and system information tools. I just assumed that you would have already verified whether you were simply asking for more memory than your OS could let a process have.

If you are trying to run 10 concurrent threads each having loaded 1/4 GB of data you're quite likely to be blowing the process memory limit. On an Intel 32-bit processor that is likely to be 2 GB.

Even if you are on hardware that allows much larger process memory, it is possible that there are admin imposed memory limits coming into play. That is something you would need to ask your admin about.

I realise that many of the chromosomes are probably well under that 1/4 GB size, and the problem will only arise if the confluence of 10 big ones comes together, but given that the bigger chromosomes will take longer to process, it is almost enevitable that they will. Ie. Even if the sizes are randomly distributed, the small ones will be processed quickly and so you will nearly always end up at the situation where you are trying to process 10 big ones at the same time.

One way around this would be to alter your thread management strategy accordingly. Instead of limiting by the number of threads running, limit by the combined size of the chromosomes you are processing:

  1. As you load each chromosome, add it to a running total.
  2. If
    the running total of data loaded + 10 MB * the number of running threads + the size of the next chromosome (the filesize should be a good enou +gh approximation for this purpose)

    is less than the memory limit for your process, start another thread on the next chromosome.

  3. Otherwise wait until a thread terminates and subtract it's memory usage from the running total.

There are various ways you could adjust that algorithm to try and balance the number of threads and the memory consumed, but as is it should be prevent the current problem. Assuming that process memory is being exceeded as now seems likely.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^9: Memory Usage in Regex On Large Sequence by BrowserUk
in thread Memory Usage in Regex On Large Sequence by bernanke01

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.