Or lower case each probe string, that's probably cheaper. Of course if the original data was all the same case then some overhead can be avoided. If the probe strings or the large string are used multiple times then it may be worthwhile preprocessing the data.