in reply to Re^4: Memory Usage in Regex On Large Sequence
in thread Memory Usage in Regex On Large Sequence

You could check for $& and freinds having been used somewhere in a library using Devel::SawAmpersand.

However, I tried running the following minimal emulation of your code:

#! perl -slw use strict; use Carp; use threads; use threads::shared; $| = 1; our $N ||= 2; our $SIZE ||= 1e6; my $semaphore :shared = 0; my $running : shared = 0; 'abc' =~ m[b] and print "$`:$&:$'"; ## Use ampersand. my $bigString = 'ACTG' x $SIZE; for ( 1 .. $N ) { async { printf "Thread %s starting\n", threads->tid; ++$running; my $count = 0; while( $bigString =~ m[ACTG]g ) { #lock $semaphore; #print threads->tid, ' : ', pos( $bigString ); ++$count; } --$running; printf "Thread %s stopping ($count)\n", threads->tid; }; } Win32::Sleep 100 until $running; Win32::Sleep 100 while $running;

With and without the highlighted line and it doesn't cause a crash on my system even when running 100 threads and a 10e6 char sequence. It runs hugely more slowly, but that is expected.

The only thing I can see missing from my simplified version is Bio::SeqIO (Darn thing will never install here!). As a test, you could try substituting this crude Fasta sequence load code (taken from Re: Forking Multiple Regex's on a Single String (use threads))

## Crude fasta load--Expects 1 sequence per file open my $fh, '<', $path or croak "$path : $!\n"; <$fh>; ## discard header ( my $sequence = do{ local $/; <$fh> } ) =~ s[\s+][]g; close $fh;

and remove the dependancy upon that module and see what if any difference that makes.

Beyond that, you could try running my emulation above on your system and see if that also causes the Out of memory failure.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: Memory Usage in Regex On Large Sequence
by bernanke01 (Beadle) on Sep 26, 2006 at 22:41 UTC

    Hi Browser,

    Yup, removing Bio::SeqIO doesn't resolve the crashes but it dramatically speeds up execution. Further, the program runs quite a bit deeper before crashing. I also tried reducing the number of threads in the BioPerl-less program, and found that the fewer threads active at any one time, the deeper execution can go. So, with just 3 threads it completes, but with 5 it terminates about 80% of the way through the dataset. That seems to suggest that the threads are using a lot of memory: is there a way of assessing the memory footprint of an individual threads?

    Also, you had previously mentioned that it might be useful to just create a pool of threads at the front, and to reuse them. I'd like to try that out, but I'm unsure how to do that, and didn't see anything in perldoc threads, but I could have missed it. My thought here is that perhaps threads are "leaking" some memory on my system, and reusing threads might help identify that.

    Many thanks (again) for your help

      is there a way of assessing the memory footprint of an individual threads?

      I know how to do that on Win32, but I've no experience of threads *nix stuff. Maybe top shows you something?

      How big is/are the sequences you are searching?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I don't have privileges for top, but I just put in a req for that, so I should be able to test out in a day or two. These are complete chromosomes, so they range in size from about 20kbp (20,000 letters) for the mitochondria to about 250 million bp for chromosome 1.