in reply to Re: Memory Usage in Regex On Large Sequence
in thread Memory Usage in Regex On Large Sequence

Hi,

It's 10 threads, but repeats at 3 as well. For now, the motifs are just 6 strings of GCAT of 5 or 6 letters each. In theory I'd like to use more complex motifs of course. So, here are my current motif-list:

GCGTG GTGCG CACGC CGCAC CACGTG GTGCAC

Replies are listed 'Best First'.
Re^3: Memory Usage in Regex On Large Sequence
by dave_the_m (Monsignor) on Sep 25, 2006 at 19:38 UTC
    In that case, I can't see any good reason why replacing index(..,"GTGCAC",...) with /GTGCAC/ should use any more memory, except that if $sequence is very large, and if somewhere in the program (eg from an included module), $`, $& or $' is used, then perl has to take a complete copy of $sequence each time, which might put you about the threshold.

    Dave.

      Ahh, interesting -- is there a way to verify if that is happening, or to prevent the copying from going on?

        You could check for $& and freinds having been used somewhere in a library using Devel::SawAmpersand.

        However, I tried running the following minimal emulation of your code:

        #! perl -slw use strict; use Carp; use threads; use threads::shared; $| = 1; our $N ||= 2; our $SIZE ||= 1e6; my $semaphore :shared = 0; my $running : shared = 0; 'abc' =~ m[b] and print "$`:$&:$'"; ## Use ampersand. my $bigString = 'ACTG' x $SIZE; for ( 1 .. $N ) { async { printf "Thread %s starting\n", threads->tid; ++$running; my $count = 0; while( $bigString =~ m[ACTG]g ) { #lock $semaphore; #print threads->tid, ' : ', pos( $bigString ); ++$count; } --$running; printf "Thread %s stopping ($count)\n", threads->tid; }; } Win32::Sleep 100 until $running; Win32::Sleep 100 while $running;

        With and without the highlighted line and it doesn't cause a crash on my system even when running 100 threads and a 10e6 char sequence. It runs hugely more slowly, but that is expected.

        The only thing I can see missing from my simplified version is Bio::SeqIO (Darn thing will never install here!). As a test, you could try substituting this crude Fasta sequence load code (taken from Re: Forking Multiple Regex's on a Single String (use threads))

        ## Crude fasta load--Expects 1 sequence per file open my $fh, '<', $path or croak "$path : $!\n"; <$fh>; ## discard header ( my $sequence = do{ local $/; <$fh> } ) =~ s[\s+][]g; close $fh;

        and remove the dependancy upon that module and see what if any difference that makes.

        Beyond that, you could try running my emulation above on your system and see if that also causes the Out of memory failure.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.