in reply to Re^4: Regexp for range of numbers
in thread Regexp for range of numbers
The relevant perldelta briefly touches on the subject of the Trie optimization for regexes. Basically, I think, the idea is that you convert the constant part of the regex into hashes of hashes that you can then walk very quickly (if you are using C instead of Perl), like the following hash to match the numbers (11,122,233,31) (except that the following is not valid Perl syntax as the hash needs to be created step-by-step to make the internal forward references work):
my %trie; %trie = ( 1 => { 1 => 'accept', 2 => { 2 => 'accept', 3 => \%trie{2}->{3}, # since we already saw a 23 _default => \%trie, } _default => \%trie }, 2 => { 3 => { 3 => 'accept', _default => \%trie{3}, # we already saw a 3 }, _default => \%trie }, 3 => { 1 => 'accept', _default => \%trie }, _default => \%trie; ); # hash walker omitted
The next step will likely be Aho-Corasick optimizations for regexes, which you can watch animated here.
Updated: After a discussion with bart, I expanded the example to shed more light on the importance of links within the %trie structure. If a match fails for 122, we can continue trying to match 223 and store the knowledge that we've already looked at 22 in the deep link within %trie.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Regexp for range of numbers
by demerphq (Chancellor) on Apr 06, 2005 at 08:31 UTC |