Re^3: How Index function works??

Replies are listed 'Best First'.
Re^4: How Index function works?? by LanX (Saint) on Oct 24, 2011 at 09:15 UTC
Hi davido well it was in June, below what I found on my disk: There is an "X" in the middle of a 52MB string of repeated alphabet letters (`$pattern = join "","a".."z"`) Depending if you look for "Xabc..xyz" or "abc...xyzX" the different approaches show their strength. I did more tests which I can't find anymore strongly indicating that index doesn't use Boyer-Moore. Just vary the position of the "X". I'd be glad if you looked it over. :) use Time::HiRes qw[ time ]; my $pattern = join "","a".."z"; my $str= $pattern x 1E6 . "X" .$pattern x1E6; $\="\n"; $\|=1; print "Length: ",length $str; print "\n---End X"; $start=time; print "Match: ", $str =~/${pattern}X/; printf "\t took %.3f sec\n",time-$start; $start=time; print "Index: ", index $str , "${pattern}X"; printf "\t took %.3f sec\n",time-$start; print "\n---Start X"; $start=time; print "Match: ", $str =~/X${pattern}/; printf "\t took %.3f sec\n",time-$start; $start=time; print "Index: ", index $str , "X${pattern}"; printf "\t took %.3f sec\n",time-$start; [download] RESULT: `Length: 52000001 ---End X Match: 1 took 0.021 sec Index: 25999974 took 0.263 sec ---Start X Match: 1 took 0.165 sec Index: 26000000 took 0.094 sec` [download] Cheers Rolf	[reply] [d/l] [select]
Re^5: How Index function works?? by Corion (Patriarch) on Oct 24, 2011 at 09:57 UTC
It's not that index doesn't use Boyer Moore, it's more that the regex engine doesn't always use Boyer Moore. If the regex engine comes to the conclusion to use `screaminstr`, it will use that instead of Boyer Moore. I guess that if you change your data to be more favorable to Boyer Moore, the results will reverse as the regular expression will still not use Boyer Moore. Update: Even weirder. Looking at the log of `use re 'Debug','ALL';`, it claims that `fbm_instr()` is used: C:\Projekte>perl -Mre=Debug,ALL -e "shift=~/abcdefghijklmnopqrstuvwxyz +X/" abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzXabcdefghijklmnopqrstuvw +xyzabcdefghijklmnopqrstuvwxyz Compiling REx "abcdefghijklmnopqrstuvwxyzX" Starting first pass (sizing) >abcdefghij... \| 1\| reg \| \| brnc \| \| piec \| \| atom Required size 9 nodes Starting second pass (creation) >abcdefghij... \| 1\| reg \| \| brnc \| \| piec \| \| atom >< \| 10\| tail~ EXACT <abcdefghijklmnopqrstuvwxyzX> (1 +) -> END first:> 1: EXACT <abcdefghijklmnopqrstuvwxyzX> (9) first at 1 Peep:Pos:0/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'' 0:0/0 Fixed:'' @ 0 + Float: '' @ 0/0 Peep> 1: EXACT <abcdefghijklmnopqrstuvwxyzX> (9) join> 1: EXACT <abcdefghijklmnopqrstuvwxyzX> (9) pre-fin:Pos:27/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'abcdefghijklmnopq +rstuvwxyzX' 27:0/0 Fixed:'' @ 0 Float: '' @ 0/0 post-fin:Pos:27/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'abcdefghijklmnop +qrstuvwxyzX' 27:0/0 Fixed:'' @ 0 Float: '' @ 0/0 commit: Pos:27/0 Flags: 0x0 Whilem_c: 0 Lcp: 0 Last:'abcdefghijklmnopq +rstuvwxyzX' -1:0/0 Fixed:'abcdefghijklmnopqrstuvwxyzX' @ 0 Fl oat: '' @ 0/0 minlen: 27 r->minlen:0 Final program: 1: EXACT <abcdefghijklmnopqrstuvwxyzX> (9) 9: END (0) anchored "abcdefghijklmnopqrstuvwxyzX" at 0 (checking anchored isall) +minlen 27 r->extflags: CHECK_ALL USE_INTUIT_NOML USE_INTUIT_ML Guessing start of match in sv for REx "abcdefghijklmnopqrstuvwxyzX" ag +ainst "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc defgh"... Check offset min: 0 Start shift: 0 End shift 0 Real End Shift: 0 fbm_instr len=157 str=<abcdefghijklmnopqrst> Found anchored substr "abcdefghijklmnopqrstuvwxyzX" at offset 78... Check offset min:0 max:0 S:78 t:78 D:0 end:157 Starting position does not contradict /^/m... Guessed: match at offset 78 Freeing REx: "abcdefghijklmnopqrstuvwxyzX" [download] So, I don't understand what happens, or how the RE engine is faster than index when they both fire up `fbm_instr`.	[reply] [d/l] [select]
Re^6: How Index function works?? by LanX (Saint) on Oct 24, 2011 at 10:13 UTC
I can't follow... could you plz show me what you think is a fair test? Cheers Rolf	[reply]