in reply to Numeric list to optimised regexp

Out of curiosity, how does its output compare to my more general-purpose "create an optimized RE for this list" that I gave in RE (tilly) 4: SAS log scanner?
  • Comment on RE (tilly) 1: Numeric list to optimised regexp

Replies are listed 'Best First'.
RE: RE (tilly) 1: Numeric list to optimised regexp
by ncw (Friar) on Sep 07, 2000 at 12:54 UTC
    I didn't realise anyone had had a go at this sort of thing already, though it is inevitable really!

    I'll anwser tilly's question with an example:-

    My code gives (for the list 1..255)

    [1-9]|(?:[1-9]|1\d|2[0-4])\d|25[0-5]
    Whereas your code gives
    ((?:1(?:|0(?:|0|1|2|3|4|5|6|7|8|9)|1(?:|0|1|2|3|4|5|6|7|8|9)|2(?:|0| +1|2|3|4|5|6|7|8|9)|3(?:|0|1|2|3|4|5|6|7|8|9)|4(?:|0|1|2|3|4|5|6|7|8|9 +)|5(?:|0|1|2|3|4|5|6|7|8|9)|6(?:|0|1|2|3|4|5|6|7|8|9)|7(?:|0|1|2|3|4| +5|6|7|8|9)|8(?:|0|1|2|3|4|5|6|7|8|9)|9(?:|0|1|2|3|4|5|6|7|8|9))|2(?:| +0(?:|0|1|2|3|4|5|6|7|8|9)|1(?:|0|1|2|3|4|5|6|7|8|9)|2(?:|0|1|2|3|4|5| +6|7|8|9)|3(?:|0|1|2|3|4|5|6|7|8|9)|4(?:|0|1|2|3|4|5|6|7|8|9)|5(?:|0|1 +|2|3|4|5)|6|7|8|9)|3(?:|0|1|2|3|4|5|6|7|8|9)|4(?:|0|1|2|3|4|5|6|7|8|9 +)|5(?:|0|1|2|3|4|5|6|7|8|9)|6(?:|0|1|2|3|4|5|6|7|8|9)|7(?:|0|1|2|3|4| +5|6|7|8|9)|8(?:|0|1|2|3|4|5|6|7|8|9)|9(?:|0|1|2|3|4|5|6|7|8|9)))
    My aim was to get rid of as many alternations as possible (which are slow) and turn them into character classes (which are fast). I wanted also to factor the regexp as much as possible.

    If you change my code replacing all \d's with \w or whatever it should work fine for any list of words, but I designed and tested it with numeric lists in mind.

    My first attempt at this problem used a trie like data structure but I abandonded it once I had the idea of using backtracking regexps - the irony of using regexps to optimise regexps was irresistable!

      OK, this is nice. OTOH I really want to see the win moved down to the RE engine, and at least one optimization that was discussed with Ilya would move all of the wins from both of our approaches down.

      So someday you should see all matches speed up because of this kind of logic, without having to do any work for it... :-)

RE: RE (tilly) 1: Numeric list to optimised regexp
by tye (Sage) on Sep 07, 2000 at 02:56 UTC

    You might check out Text::Trie (by Ilya). So it might be useful to have "right" Tries to go with these "left" Tries.

            - tye (but my friends call me "Tye")
      I didn't know that he released that as a module. I would hope that someday this win is native in the RE...