in reply to RE: RE (tilly) 4: SAS log scanner
in thread SAS log scanner

With all of the benchmarks flying around, I don't see anyone testing the original suggestion of /this|that|etc/ without the trie fanciness.

nada: 0.39 CPU @ 2.56/s (n=1) trie: 3.73 CPU @ 0.27/s (n=1) or: 6.04 CPU @ 0.17/s (n=1) alt_re: 8.52 CPU @ 0.12/s (n=1) alt_sub: 8.62 CPU @ 0.12/s (n=1) index: 9.17 CPU @ 0.11/s (n=1) list: 15.38 CPU @ 0.07/s (n=1)

Note that "or" is faster than either "alt" version, which was my original point. I was just curious is that was still true since I couldn't find the documentation that stated that any more.

"nada" just reads the data. "trie" is tilly's stuff. "or" is my sub with lots of little regexen. "alt_re" is one big regex with lots of alternates compiled with the qr// operator. "alt_sub" is the same but wrapped in a subroutine and compiled with eval. "index" is my looping and using index(). "list" is looping using lots of little regexec compiled with qr//.

I think "index" and "list" lose here because of all of the looping for non-matching lines.

        - tye (but my friends call me "Tye")