Of course you can just do what lex does, but if you use a good character class as the first character and thus eliminate backtracking back over the first character (for example, eliminate the optional part of [-+]?\d+ by making it into [-+\d]\d*) you can use the atomic match operator ?> to eliminate lots of backtracking, as you know that your SQL is well-formed, or rather, it's of little concern to you if the SQL is not well-formed.
Ideally, you have a disjunct set of character classes that start the separate matches. Likely, the disjunct set would be [-+\d] for numbers and ' for strings. If you want to be more careful, you can treat 0[bx]\w+ a bit more discerning, but I wouldn't bother and simply assume instead that the SQL is well-formed.
In reply to Re^3: In search of an efficient query abstractor
by Corion
in thread In search of an efficient query abstractor
by xaprb
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |