in reply to Re: pattern match, speed problem
in thread pattern match, speed problem

And the large string needs to be uppercased, since the pattern match was case insensitive.

Replies are listed 'Best First'.
Re^3: pattern match, speed problem
by hipowls (Curate) on Feb 20, 2008 at 07:56 UTC

    Or lower case each probe string, that's probably cheaper. Of course if the original data was all the same case then some overhead can be avoided. If the probe strings or the large string are used multiple times then it may be worthwhile preprocessing the data.

    perl -pe'tr/acgt/ACGT/' -i big_string_file
    or
    perl -pe'tr/ACGT/acgt/' -i probe_string_file