in reply to Benchmarking "Are all these characters in this sentence?"

I see two interesting variables that you could change - the length of the string (and not just up to about 20, but perhaps up to 20k characters), and the number of strings that each set of characters is tested on.

Currently the comparison is just a random point of data in the big, two-dimensional benchmark space ;-)

  • Comment on Re: Benchmarking "Are all these characters in this sentence?"

Replies are listed 'Best First'.
Re^2: Benchmarking "Are all these characters in this sentence?"
by RMGir (Prior) on Aug 29, 2008 at 00:22 UTC
    Well, to get really thorough, we should also have some utf-8 string tests. But I haven't done anything with utf-8 in perl, so I'll leave writing those cases to someone else.

    I've added a few 3k-ish character search strings, and a few 3k-ish character character sets... Doing the long cases moves the results more in favour of Tanktalus_AllIndex:

    Short sentence and search set cases
    
    tallulah_OriginalPost            2054/s
    Tanktalus_AllRegex               2511/s
    Tanktalus_AllRegex_Study         2522/s
    moritz_BuildRegex_WithStudy      2595/s
    moritz_BuildRegex                2715/s
    varian_hash                      2983/s
    RMGir_slice                      4035/s
    Tanktalus_AllIndex               8219/s
    RMGir_index                     12107/s
    
    
    Long sentence and Short search set cases
    
    varian_hash                      97.2/s
    RMGir_slice                       115/s
    moritz_BuildRegex_WithStudy      3172/s
    tallulah_OriginalPost            3230/s
    Tanktalus_AllRegex_Study         3319/s
    Tanktalus_AllRegex               4054/s
    moritz_BuildRegex                4250/s
    Tanktalus_AllIndex              13032/s
    RMGir_index                     17612/s
    
    Short sentence and Long search set cases
    moritz_BuildRegex_WithStudy      54.1/s
    moritz_BuildRegex                54.6/s
    tallulah_OriginalPost            63.6/s
    Tanktalus_AllRegex               86.9/s
    Tanktalus_AllRegex_Study         87.1/s
    varian_hash                       161/s
    RMGir_index                       285/s
    RMGir_slice                       319/s
    Tanktalus_AllIndex                320/s
    
    Long sentence and Long search set cases
    moritz_BuildRegex_WithStudy      54.1/s
    moritz_BuildRegex                54.6/s
    tallulah_OriginalPost            63.6/s
    varian_hash                      86.9/s
    Tanktalus_AllRegex_Study         87.7/s
    Tanktalus_AllRegex               87.7/s
    RMGir_slice                       135/s
    RMGir_index                       250/s
    Tanktalus_AllIndex                319/s
    
    Here's the benchmark code with the added data points:

    Mike