Re^2: is index faster than regexp for fixed text token?

Replies are listed 'Best First'.
Re^3: is index faster than regexp for fixed text token? by Marshall (Canon) on Jul 06, 2009 at 07:24 UTC
I got similar results...index faster on my machine! `Rate regexp index regexp 848/s -- -74% index 3269/s 286% --` [download] I am unsure why the Ikegami machine does so well on regex. He does have a 64 bit vs 32 bit machine and I don't see why it should make so much difference, but there could be something about that 64 vs 32 bit that speeds things up a lot. I am running Perl 5.10 which is significantly better on regex than Perl 5.8. I suppose there could be other things related to the power of the Ikegami machine like more L2 cache. UPDATE I ran with longer strings to be searched and result does appear to approach the same execution time for both cases: `Rate index regexp index 470/s -- -5% regexp 494/s 5% --` [download] This is an interesting finding. I am not sure why this happens. What I'm guessing is that when the string to be searched is relatively "small", (some ~10+ thousands of chars), index() works better (caveat: in this "search for string X" situation!) because although "dumb" it is fast. But at some point some more "computationally expensive" regex algorithm "gains ground". This is an interesting question for which I have no general heuristic. I think it depends upon length of string to be searched, length of string that we are looking for, the data in each string and perhaps a lot more!	[reply] [d/l] [select]
Re^4: is index faster than regexp for fixed text token? by ikegami (Patriarch) on Jul 06, 2009 at 16:43 UTC
Actually, I seem to remember hearing 5.10's regex engine has a higher initialisation cost (due to the changes to move away from recursion), but maybe I'm thinking of tries (which does have a higher compilation time). I was using 5.8.8, 32-bit threaded. My machine is somewhat old, a P4 without even hyper-threading (which had just come out). I think 5.10.0 gave me similar results, but I'm not sure.	[reply]
Re^5: is index faster than regexp for fixed text token? by Marshall (Canon) on Jul 06, 2009 at 17:12 UTC
From my experience: yes, the Perl 5.10 regex engine is faster than the equivalent thing on 5.8. The situation here is that we are working with "contrived", non-real-world data sets. It is possible to "get fooled", meaning that benchmarks on the test data may not reflect the actual real-world performance. My machine is one of the early "hyper-threaded" things. It looks like 2 CPU's to the O/S, but there is a huge memory bottleneck to the CPU's. If I have a process that takes 1 hour and I fire up 2 of them, it won't take 2 hours to do them both. It will take like 1.6 hours to do them both. I will also say that when I do something like that, my computer turns into a "space heater". In the winter time, I run SETI@home or one of the BOINC projects all the time figuring that if I am going to heat the apartment, I might as well try to do something useful. the newer machines will complete both 1 hour CPU tasks in close to 1 hour (about 2x for CPU compute bound jobs). Anyway, tuning an app that is I/O bound (regex,split,etc matters) has a lot to do with actual data sets. Is there some web place where you could put an actual data set for us to work on optimizations?	[reply]