in reply to RE (tilly) 1: Numeric list to optimised regexp
in thread Numeric list to optimised regexp

I didn't realise anyone had had a go at this sort of thing already, though it is inevitable really!

I'll anwser tilly's question with an example:-

My code gives (for the list 1..255)

[1-9]|(?:[1-9]|1\d|2[0-4])\d|25[0-5]
Whereas your code gives
((?:1(?:|0(?:|0|1|2|3|4|5|6|7|8|9)|1(?:|0|1|2|3|4|5|6|7|8|9)|2(?:|0| +1|2|3|4|5|6|7|8|9)|3(?:|0|1|2|3|4|5|6|7|8|9)|4(?:|0|1|2|3|4|5|6|7|8|9 +)|5(?:|0|1|2|3|4|5|6|7|8|9)|6(?:|0|1|2|3|4|5|6|7|8|9)|7(?:|0|1|2|3|4| +5|6|7|8|9)|8(?:|0|1|2|3|4|5|6|7|8|9)|9(?:|0|1|2|3|4|5|6|7|8|9))|2(?:| +0(?:|0|1|2|3|4|5|6|7|8|9)|1(?:|0|1|2|3|4|5|6|7|8|9)|2(?:|0|1|2|3|4|5| +6|7|8|9)|3(?:|0|1|2|3|4|5|6|7|8|9)|4(?:|0|1|2|3|4|5|6|7|8|9)|5(?:|0|1 +|2|3|4|5)|6|7|8|9)|3(?:|0|1|2|3|4|5|6|7|8|9)|4(?:|0|1|2|3|4|5|6|7|8|9 +)|5(?:|0|1|2|3|4|5|6|7|8|9)|6(?:|0|1|2|3|4|5|6|7|8|9)|7(?:|0|1|2|3|4| +5|6|7|8|9)|8(?:|0|1|2|3|4|5|6|7|8|9)|9(?:|0|1|2|3|4|5|6|7|8|9)))
My aim was to get rid of as many alternations as possible (which are slow) and turn them into character classes (which are fast). I wanted also to factor the regexp as much as possible.

If you change my code replacing all \d's with \w or whatever it should work fine for any list of words, but I designed and tested it with numeric lists in mind.

My first attempt at this problem used a trie like data structure but I abandonded it once I had the idea of using backtracking regexps - the irony of using regexps to optimise regexps was irresistable!

Replies are listed 'Best First'.
RE (tilly) 3 (nice): Numeric list to optimised regexp
by tilly (Archbishop) on Sep 07, 2000 at 14:29 UTC
    OK, this is nice. OTOH I really want to see the win moved down to the RE engine, and at least one optimization that was discussed with Ilya would move all of the wins from both of our approaches down.

    So someday you should see all matches speed up because of this kind of logic, without having to do any work for it... :-)