in reply to is index faster than regexp for fixed text token?

I get the same performance for both:
Rate index regexp index 299/s -- -5% regexp 316/s 6% -- Rate regexp index regexp 294/s -- -12% index 332/s 13% -- Rate regexp index regexp 303/s -- -3% index 313/s 3% --

That's what I'd expect, since index and regex matches use the same algorithm (Boyer-Moore) to find constant strings

Replies are listed 'Best First'.
Re^2: is index faster than regexp for fixed text token?
by sflitman (Hermit) on Jul 06, 2009 at 05:06 UTC
    I don't know, I definitely get a big difference. This is with Perl 5.10.0
    Without study: Rate index regexp index 466/s -- -27% regexp 639/s 37% -- With study: Rate index regexp index 467/s -- -54% regexp 1019/s 118% --
    Doesn't it make sense that regexp has an advantage in 5.10 since it is building a trie?

    SSF

      I got similar results...index faster on my machine!

      Rate regexp index regexp 848/s -- -74% index 3269/s 286% --
      I am unsure why the Ikegami machine does so well on regex. He does have a 64 bit vs 32 bit machine and I don't see why it should make so much difference, but there could be something about that 64 vs 32 bit that speeds things up a lot.

      I am running Perl 5.10 which is significantly better on regex than Perl 5.8. I suppose there could be other things related to the power of the Ikegami machine like more L2 cache.

      UPDATE

      I ran with longer strings to be searched and result does appear to approach the same execution time for both cases:

      Rate index regexp index 470/s -- -5% regexp 494/s 5% --

      This is an interesting finding. I am not sure why this happens. What I'm guessing is that when the string to be searched is relatively "small", (some ~10+ thousands of chars), index() works better (caveat: in this "search for string X" situation!) because although "dumb" it is fast. But at some point some more "computationally expensive" regex algorithm "gains ground".

      This is an interesting question for which I have no general heuristic. I think it depends upon length of string to be searched, length of string that we are looking for, the data in each string and perhaps a lot more!

        Actually, I seem to remember hearing 5.10's regex engine has a higher initialisation cost (due to the changes to move away from recursion), but maybe I'm thinking of tries (which does have a higher compilation time).

        I was using 5.8.8, 32-bit threaded. My machine is somewhat old, a P4 without even hyper-threading (which had just come out).

        I think 5.10.0 gave me similar results, but I'm not sure.

Re^2: is index faster than regexp for fixed text token?
by sflitman (Hermit) on Jul 06, 2009 at 05:07 UTC
    I don't know, I definitely get a big difference. This is with Perl 5.10.0
    Without study: Rate index regexp index 466/s -- -27% regexp 639/s 37% -- With study: Rate index regexp index 467/s -- -54% regexp 1019/s 118% --
    Doesn't it make sense that regexp has an advantage in 5.10 since it is building a trie like Regexp::Assemble?

    SSF

      There's no trie involved since there's no alternation.