Re: Benchmarking "Are all these characters in this sentence?"

I see two interesting variables that you could change - the length of the string (and not just up to about 20, but perhaps up to 20k characters), and the number of strings that each set of characters is tested on.

Currently the comparison is just a random point of data in the big, two-dimensional benchmark space ;-)

Comment on Re: Benchmarking "Are all these characters in this sentence?"

Replies are listed 'Best First'.
Re^2: Benchmarking "Are all these characters in this sentence?" by RMGir (Prior) on Aug 29, 2008 at 00:22 UTC
Well, to get really thorough, we should also have some utf-8 string tests. But I haven't done anything with utf-8 in perl, so I'll leave writing those cases to someone else. I've added a few 3k-ish character search strings, and a few 3k-ish character character sets... Doing the long cases moves the results more in favour of Tanktalus_AllIndex: Short sentence and search set cases tallulah_OriginalPost 2054/s Tanktalus_AllRegex 2511/s Tanktalus_AllRegex_Study 2522/s moritz_BuildRegex_WithStudy 2595/s moritz_BuildRegex 2715/s varian_hash 2983/s RMGir_slice 4035/s Tanktalus_AllIndex 8219/s RMGir_index 12107/s Long sentence and Short search set cases varian_hash 97.2/s RMGir_slice 115/s moritz_BuildRegex_WithStudy 3172/s tallulah_OriginalPost 3230/s Tanktalus_AllRegex_Study 3319/s Tanktalus_AllRegex 4054/s moritz_BuildRegex 4250/s Tanktalus_AllIndex 13032/s RMGir_index 17612/s Short sentence and Long search set cases moritz_BuildRegex_WithStudy 54.1/s moritz_BuildRegex 54.6/s tallulah_OriginalPost 63.6/s Tanktalus_AllRegex 86.9/s Tanktalus_AllRegex_Study 87.1/s varian_hash 161/s RMGir_index 285/s RMGir_slice 319/s Tanktalus_AllIndex 320/s Long sentence and Long search set cases moritz_BuildRegex_WithStudy 54.1/s moritz_BuildRegex 54.6/s tallulah_OriginalPost 63.6/s varian_hash 86.9/s Tanktalus_AllRegex_Study 87.7/s Tanktalus_AllRegex 87.7/s RMGir_slice 135/s RMGir_index 250/s Tanktalus_AllIndex 319/s Here's the benchmark code with the added data points: Read more... (14 kB) Mike	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Benchmarking "Are all these characters in this sentence?"
by RMGir (Prior) on Aug 29, 2008 at 00:22 UTC

I've added a few 3k-ish character search strings, and a few 3k-ish character character sets... Doing the long cases moves the results more in favour of Tanktalus_AllIndex:

Short sentence and search set cases

tallulah_OriginalPost            2054/s
Tanktalus_AllRegex               2511/s
Tanktalus_AllRegex_Study         2522/s
moritz_BuildRegex_WithStudy      2595/s
moritz_BuildRegex                2715/s
varian_hash                      2983/s
RMGir_slice                      4035/s
Tanktalus_AllIndex               8219/s
RMGir_index                     12107/s


Long sentence and Short search set cases

varian_hash                      97.2/s
RMGir_slice                       115/s
moritz_BuildRegex_WithStudy      3172/s
tallulah_OriginalPost            3230/s
Tanktalus_AllRegex_Study         3319/s
Tanktalus_AllRegex               4054/s
moritz_BuildRegex                4250/s
Tanktalus_AllIndex              13032/s
RMGir_index                     17612/s

Short sentence and Long search set cases
moritz_BuildRegex_WithStudy      54.1/s
moritz_BuildRegex                54.6/s
tallulah_OriginalPost            63.6/s
Tanktalus_AllRegex               86.9/s
Tanktalus_AllRegex_Study         87.1/s
varian_hash                       161/s
RMGir_index                       285/s
RMGir_slice                       319/s
Tanktalus_AllIndex                320/s

Long sentence and Long search set cases
moritz_BuildRegex_WithStudy      54.1/s
moritz_BuildRegex                54.6/s
tallulah_OriginalPost            63.6/s
varian_hash                      86.9/s
Tanktalus_AllRegex_Study         87.7/s
Tanktalus_AllRegex               87.7/s
RMGir_slice                       135/s
RMGir_index                       250/s
Tanktalus_AllIndex                319/s