Re^2: Regexp for range of numbers

Replies are listed 'Best First'.
Re^3: Regexp for range of numbers by Corion (Patriarch) on Apr 05, 2005 at 15:55 UTC
Please note that I mentioned 5.9.2, the most current release of the developer branch, which has a patch by demerphq, that builds a very optimized tree out of a list like mine. The tree (actually a Trie) can the optimally match/compare against the string, by looking at exactly the first three chars of the string and quickly decide if it matches or not.	[reply]
Re^4: Regexp for range of numbers by 5mi11er (Deacon) on Apr 05, 2005 at 17:50 UTC
Very cool. So is there detailed information available somewhere about this new feature? I imagine it's doing some hash or binary tree type searches... -Scott	[reply]
Re^5: Regexp for range of numbers by Corion (Patriarch) on Apr 05, 2005 at 20:21 UTC
The relevant perldelta briefly touches on the subject of the Trie optimization for regexes. Basically, I think, the idea is that you convert the constant part of the regex into hashes of hashes that you can then walk very quickly (if you are using C instead of Perl), like the following hash to match the numbers (11,122,233,31) (except that the following is not valid Perl syntax as the hash needs to be created step-by-step to make the internal forward references work): `my %trie; %trie = ( 1 => { 1 => 'accept', 2 => { 2 => 'accept', 3 => \%trie{2}->{3}, # since we already saw a 23 _default => \%trie, } _default => \%trie }, 2 => { 3 => { 3 => 'accept', _default => \%trie{3}, # we already saw a 3 }, _default => \%trie }, 3 => { 1 => 'accept', _default => \%trie }, _default => \%trie; ); # hash walker omitted` [download] The next step will likely be Aho-Corasick optimizations for regexes, which you can watch animated here. Updated: After a discussion with bart, I expanded the example to shed more light on the importance of links within the `%trie` structure. If a match fails for `122`, we can continue trying to match `223` and store the knowledge that we've already looked at `22` in the deep link within `%trie`.	[reply] [d/l]
Re^6: Regexp for range of numbers by demerphq (Chancellor) on Apr 06, 2005 at 08:31 UTC