The relevant perldelta briefly touches on the subject of the Trie optimization for regexes. Basically, I think, the idea is that you convert the constant part of the regex into hashes of hashes that you can then walk very quickly (if you are using C instead of Perl), like the following hash to match the numbers (11,122,233,31) (except that the following is not valid Perl syntax as the hash needs to be created step-by-step to make the internal forward references work):
my %trie;
%trie = (
1 => { 1 => 'accept',
2 => {
2 => 'accept',
3 => \%trie{2}->{3}, # since we already saw a 23
_default => \%trie,
}
_default => \%trie },
2 => { 3 => {
3 => 'accept',
_default => \%trie{3}, # we already saw a 3
},
_default => \%trie
},
3 => { 1 => 'accept', _default => \%trie },
_default => \%trie;
);
# hash walker omitted
The next step will likely be Aho-Corasick optimizations for regexes, which you can watch animated here.
Updated: After a discussion with bart, I expanded the example to shed more light on the importance of links within the %trie structure. If a match fails for 122, we can continue trying to match 223 and store the knowledge that we've already looked at 22 in the deep link within %trie. |