Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^2: Multiple Regex's on a Big Sequence - Benchmark

by hv (Parson)
on Aug 17, 2006 at 11:21 UTC ( #567889=note: print w/replies, xml ) Need Help??

in reply to Re: Multiple Regex's on a Big Sequence - Benchmark
in thread Multiple Regex's on a Big Sequence

For the cases where you compare multiple regexps against your target string, it may save time if you also study($sequence) before starting the matches.

This will do a scan of the sequence to allow subsequent matches to use the Boyer-Moore algorithm - it builds a linked list of the locations of each different character in the sequence, and then takes advantage of the frequency data to pick the rarest character for which to walk the list.

Because the main benefit of this approach is about rarity, it may not be a big win for a case like this where the string uses only a 4-character alphabet, and (presumably) uses each character roughly 1/4 of the time; I'd be interested to see how it affects the benchmarks.


Replies are listed 'Best First'.
Re^3: Multiple Regex's on a Big Sequence - Benchmark
by bernanke01 (Beadle) on Aug 18, 2006 at 02:02 UTC
    Great idea, I'll add it to the next round of Benchmarks.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://567889]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2021-12-05 14:07 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (31 votes). Check out past polls.