Re^4: is index faster than regexp for fixed text token?

Actually, I seem to remember hearing 5.10's regex engine has a higher initialisation cost (due to the changes to move away from recursion), but maybe I'm thinking of tries (which does have a higher compilation time).

I was using 5.8.8, 32-bit threaded. My machine is somewhat old, a P4 without even hyper-threading (which had just come out).

I think 5.10.0 gave me similar results, but I'm not sure.

Comment on Re^4: is index faster than regexp for fixed text token?

Replies are listed 'Best First'.
Re^5: is index faster than regexp for fixed text token? by Marshall (Canon) on Jul 06, 2009 at 17:12 UTC
From my experience: yes, the Perl 5.10 regex engine is faster than the equivalent thing on 5.8. The situation here is that we are working with "contrived", non-real-world data sets. It is possible to "get fooled", meaning that benchmarks on the test data may not reflect the actual real-world performance. My machine is one of the early "hyper-threaded" things. It looks like 2 CPU's to the O/S, but there is a huge memory bottleneck to the CPU's. If I have a process that takes 1 hour and I fire up 2 of them, it won't take 2 hours to do them both. It will take like 1.6 hours to do them both. I will also say that when I do something like that, my computer turns into a "space heater". In the winter time, I run SETI@home or one of the BOINC projects all the time figuring that if I am going to heat the apartment, I might as well try to do something useful. the newer machines will complete both 1 hour CPU tasks in close to 1 hour (about 2x for CPU compute bound jobs). Anyway, tuning an app that is I/O bound (regex,split,etc matters) has a lot to do with actual data sets. Is there some web place where you could put an actual data set for us to work on optimizations?	[reply]

Replies are listed 'Best First'.

Re^5: is index faster than regexp for fixed text token?
by Marshall (Canon) on Jul 06, 2009 at 17:12 UTC

The situation here is that we are working with "contrived", non-real-world data sets. It is possible to "get fooled", meaning that benchmarks on the test data may not reflect the actual real-world performance.

My machine is one of the early "hyper-threaded" things. It looks like 2 CPU's to the O/S, but there is a huge memory bottleneck to the CPU's. If I have a process that takes 1 hour and I fire up 2 of them, it won't take 2 hours to do them both. It will take like 1.6 hours to do them both. I will also say that when I do something like that, my computer turns into a "space heater". In the winter time, I run SETI@home or one of the BOINC projects all the time figuring that if I am going to heat the apartment, I might as well try to do something useful.

the newer machines will complete both 1 hour CPU tasks in close to 1 hour (about 2x for CPU compute bound jobs).

Anyway, tuning an app that is I/O bound (regex,split,etc matters) has a lot to do with actual data sets. Is there some web place where you could put an actual data set for us to work on optimizations?

[reply]