But from previous work that I've done, if you have a bunch of terms that are Or'd together X|Y, that the regex engine will do this more efficiently if it can see them all, rather than you running separate regex X, then Y.
One module that I have works on "regex piece parts". Each small bit is tested separately, Perl builds a humongous regex with all of them Or'd together. That regex gets dynamically compiled and used. For development, I can work on one of the pieces and regression test it before getting the rest of the regex zoo involved.
The ability of Perl to dynamically create a regex and use it is something that can't be done in C#, Java, etc. Sometimes this can work out very well. I have one piece of code that uses substr + some regex stuff + some program logic to write simple somewhat overlapping Or terms to search for specific things. This has helped me in some situations where I'm trying to match "sort of like" XYZ.
Anyway consider the possibility of program generated dynamic regex. As Larry Wall says, "programs that write programs, are the happiest programs of all".
Update:
I didn't give a clear cut example of dynamic regex, so here's one that is close to a real world situation (its a big simplification of actual code): let's say that I am trying to find the word ABCD, but according to the matching rules, I am going to allow one of the letters to be wrong, for example AXCD matches. Now lets say that furthermore, I will allow a single pair of letters to be transposed (counts as one combined error). It is easy to algorithmically generate the combo's: ABCD .BCD A.CD AB.D ABC. BACD ACBD ...etc. If I use a program to generate this long sequence of Or'd terms, when the first letter is not an A, then the regex engine will immediately rule out ABCD A.CD AB.D... etc. The regex engine builds a state machine that is pretty sophisticated and it will execute quickly even if there are 30 terms in the "dumb" regex. If somebody here knows how to write a general regex that runs as quickly or actually even if you can just do it at all with one general regex, I'd like to hear about it! Regex should be able to look for words with 3,4,5,6 letters. My regex kung-foo is not up to that job.
In reply to Re: Multiple Regex evaluations or one big one?
by Marshall
in thread Multiple Regex evaluations or one big one?
by flyerhawk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |