Re: How Index function works??

Replies are listed 'Best First'.
Re^2: How Index function works?? by LanX (Saint) on Oct 24, 2011 at 08:22 UTC
when I did speed tests comparing `index` and a simple `m//`, the regex (which uses Boyer-Moore) was (mostly) considerably faster. Since you are linking to v5.14.2 and I'm still using v5.10.0 I suppose that the implementation of index has changed. Cheers Rolf	[reply] [d/l] [select]
Re^3: How Index function works?? by Corion (Patriarch) on Oct 24, 2011 at 08:41 UTC
According to perlreguts, the RE engine also uses `fbm_index()` to scan for the leftmost atom. There shouldn't be any reason why the performance of the two should differ by a large margin, and I would expect the regular expression to be a bit slower in the general case due to the setup. So I think it's either that your data somehow favours a branch in the RE engine that goes to `fbm_index` faster, or that the benchmark is not measuring what you want. But I also vaguely remember some thread about such a discrepancy on this site, ~~but I can't find it~~ is index faster than regexp for fixed text token?. `git blame` tells me nobody touched index since 2009, and that change was some refcounting change. The other changes were in 2006.	[reply] [d/l] [select]
Re^3: How Index function works?? by davido (Cardinal) on Oct 24, 2011 at 08:26 UTC
If you don't mind, could you post the benchmark code? I'm just curious to look it over. Dave	[reply]
Re^4: How Index function works?? by LanX (Saint) on Oct 24, 2011 at 09:15 UTC
Hi davido well it was in June, below what I found on my disk: There is an "X" in the middle of a 52MB string of repeated alphabet letters (`$pattern = join "","a".."z"`) Depending if you look for "Xabc..xyz" or "abc...xyzX" the different approaches show their strength. I did more tests which I can't find anymore strongly indicating that index doesn't use Boyer-Moore. Just vary the position of the "X". I'd be glad if you looked it over. :) use Time::HiRes qw[ time ]; my $pattern = join "","a".."z"; my $str= $pattern x 1E6 . "X" .$pattern x1E6; $\="\n"; $\|=1; print "Length: ",length $str; print "\n---End X"; $start=time; print "Match: ", $str =~/${pattern}X/; printf "\t took %.3f sec\n",time-$start; $start=time; print "Index: ", index $str , "${pattern}X"; printf "\t took %.3f sec\n",time-$start; print "\n---Start X"; $start=time; print "Match: ", $str =~/X${pattern}/; printf "\t took %.3f sec\n",time-$start; $start=time; print "Index: ", index $str , "X${pattern}"; printf "\t took %.3f sec\n",time-$start; [download] RESULT: `Length: 52000001 ---End X Match: 1 took 0.021 sec Index: 25999974 took 0.263 sec ---Start X Match: 1 took 0.165 sec Index: 26000000 took 0.094 sec` [download] Cheers Rolf	[reply] [d/l] [select]
Re^5: How Index function works?? by Corion (Patriarch) on Oct 24, 2011 at 09:57 UTC
Re^6: How Index function works?? by LanX (Saint) on Oct 24, 2011 at 10:13 UTC
Re^3: How Index function works?? by saranrsm (Acolyte) on Oct 24, 2011 at 08:32 UTC
LanX How big was your file?? I used file sized of 300 MB where for a pattern it took less than second but with regex it took over 5mins and I had to stop the script... I am using perl v5.14	[reply]
Re^4: How Index function works?? by LanX (Saint) on Oct 24, 2011 at 09:22 UTC
52 MB and as you can see from my test I'm cosntructing a case where Boyer-Moore must be slower than brute force. Cheers Rolf	[reply]