in reply to Re^4: Regexp for range of numbers
in thread Regexp for range of numbers

The relevant perldelta briefly touches on the subject of the Trie optimization for regexes. Basically, I think, the idea is that you convert the constant part of the regex into hashes of hashes that you can then walk very quickly (if you are using C instead of Perl), like the following hash to match the numbers (11,122,233,31) (except that the following is not valid Perl syntax as the hash needs to be created step-by-step to make the internal forward references work):

my %trie; %trie = ( 1 => { 1 => 'accept', 2 => { 2 => 'accept', 3 => \%trie{2}->{3}, # since we already saw a 23 _default => \%trie, } _default => \%trie }, 2 => { 3 => { 3 => 'accept', _default => \%trie{3}, # we already saw a 3 }, _default => \%trie }, 3 => { 1 => 'accept', _default => \%trie }, _default => \%trie; ); # hash walker omitted

The next step will likely be Aho-Corasick optimizations for regexes, which you can watch animated here.

Updated: After a discussion with bart, I expanded the example to shed more light on the importance of links within the %trie structure. If a match fails for 122, we can continue trying to match 223 and store the knowledge that we've already looked at 22 in the deep link within %trie.

Replies are listed 'Best First'.
Re^6: Regexp for range of numbers
by demerphq (Chancellor) on Apr 06, 2005 at 08:31 UTC

    The structure you have there is actually more like the DFA equvivelent of an A-C matcher. If you remove the default/back links you get a raw trie. Aho-Corasick matching actually uses an array of fail transitions, so you can think of it as the "unfilled in" version of what you have above. This disadvantage of this is that its slightly slower at run time, but the advantage is its a lot faster to construct the array than it is to fill in the trie, it also makes it easier to simulate NFA leftmost-firstoccuring behaviour instead of "first encountered" or "leftmost longest" as you know when you are doing a fail transition and not a success transition.

    I might add more later. There are a number of nodes on PM that discuss Trie's in detail, with implementation examples and the like.

    ---
    demerphq