in reply to Re: Regexp for range of numbers
in thread Regexp for range of numbers

So, stringing together a somewhat large number of static ORs in a regular expression is efficient?
-Scott

Replies are listed 'Best First'.
Re^3: Regexp for range of numbers
by Corion (Patriarch) on Apr 05, 2005 at 15:55 UTC

    Please note that I mentioned 5.9.2, the most current release of the developer branch, which has a patch by demerphq, that builds a very optimized tree out of a list like mine. The tree (actually a Trie) can the optimally match/compare against the string, by looking at exactly the first three chars of the string and quickly decide if it matches or not.

      Very cool. So is there detailed information available somewhere about this new feature? I imagine it's doing some hash or binary tree type searches...

      -Scott

        The relevant perldelta briefly touches on the subject of the Trie optimization for regexes. Basically, I think, the idea is that you convert the constant part of the regex into hashes of hashes that you can then walk very quickly (if you are using C instead of Perl), like the following hash to match the numbers (11,122,233,31) (except that the following is not valid Perl syntax as the hash needs to be created step-by-step to make the internal forward references work):

        my %trie; %trie = ( 1 => { 1 => 'accept', 2 => { 2 => 'accept', 3 => \%trie{2}->{3}, # since we already saw a 23 _default => \%trie, } _default => \%trie }, 2 => { 3 => { 3 => 'accept', _default => \%trie{3}, # we already saw a 3 }, _default => \%trie }, 3 => { 1 => 'accept', _default => \%trie }, _default => \%trie; ); # hash walker omitted

        The next step will likely be Aho-Corasick optimizations for regexes, which you can watch animated here.

        Updated: After a discussion with bart, I expanded the example to shed more light on the importance of links within the %trie structure. If a match fails for 122, we can continue trying to match 223 and store the knowledge that we've already looked at 22 in the deep link within %trie.