in reply to In search of an efficient query abstractor

There are potentially other small improvements. You could drop:
$query =~ s/[\n\r\f]+/ /g;
since you already collapsed whitespace on the line before.

You might get better performance out of:

$query =~ s/(["']).*?\1/S/g;
by replacing it with:
$query =~ s/"[^"]*"/S/g; $query =~ s/'[^']*'/S/g;
This avoids capturing characters which may or may not help. Also, you could potentially handle double quoted (") strings earlier in the sequence of regexen if that would reduce the number of potential real number strings exposed to your regex for real numbers and floats.

Finally, in:

$query =~ s{ \b(in|values?)\s*\(\s*([NS])\s*,[^\)]*\) } {$1($2+)}gx;
You can take advantage of the fact that you have already normalized whitespace:
$query =~ s{ \b(in|values?) *\( *([NS]) *,[^\)]*\) } {$1($2+)}gx;

Replies are listed 'Best First'.
Re^2: In search of an efficient query abstractor
by xaprb (Scribe) on Dec 07, 2008 at 20:40 UTC

    Good points. It occurs to me that I could just point you to my profiling results in case you really have a lot of time on your hands. See my earlier reply -- the float/real into N regex is way more expensive than others.

Re^2: In search of an efficient query abstractor
by xaprb (Scribe) on Dec 08, 2008 at 01:02 UTC
    Replacing the single quoted-string regex with single- and double-quote specific ones is indeed more efficient.