in reply to Multiple Regex's on a Big Sequence

index is usually faster at searching for a constant string than a regexp, but I haven't timed /g against its index equivalent. Here's the (untested) code:

local $, = "\t"; local $\ = "\n"; foreach my $short (qw( CACGTG GTGCAC )) { my $pos = 0; my $len = length($short); while (($pos = index($sequence, $short, $pos)) >= 0) { print $chr, '+', substr($sequence, $pos, $len); $pos += $len; } }

As an added bonus, replace $pos += $len; with $pos++; to allow overlapping matches.

If you do use regexps, @- and @+ can be used instead of pos. Specifically, substr($sequence, $-[0], $+[0] - $-[0]) will return the matched text. See perlvar.

Replies are listed 'Best First'.
Re^2: Multiple Regex's on a Big Sequence
by bernanke01 (Beadle) on Aug 16, 2006 at 15:35 UTC
    Ahh, I never even thought of index. And it has the nice advantage of being parallelizable across CPUs just like a regex. I'll benchmark the three approaches (index, regex, and combined regex's) and report back.