in reply to Re (tilly) 1: Donuts and DFA's
in thread Research ideas

Ok tilly, I think I understand now. Basically, one simple way to do this optimization would be to keep a list of all states you've visited so far that have failed. Not the exact state that failed, but the furthest in the past state that would definitively lead to that failed state.

Then, before proceeding to any further state, you check to make sure your current state isn't in the list of failed states. You could also prune states that are too old to match anymore, and also optimize the search of this failed state list heavily based on your current state.

The other thing you could do to make it more efficient is instead of going to the furthest past state, you could allow branching and store multiple fail-states for each failure that occurs.

State machines are odd beasts :) .

Ciao,
Gryn

Replies are listed 'Best First'.
Re (tilly) 1: The Return of the Donuts
by tilly (Archbishop) on Feb 21, 2001 at 00:45 UTC
    Yup. In fact Perl 5.6 does a version of this. However doing it with a run-time cache has several serious disadvantages.

    First of all it is much slower because you are always having to check, add information. With 5.6 the RE is labelled simpler or complex, and profiled at run-time, only having caching turned on if it seems slow.

    Secondly there are a variety of situations in which you have to throw away the cache. At least one of the RE bugs in 5.6.0 is that it was not smart enough in doing this and got it wrong in a few cases. It appears that with my approach it takes less work to figure out when you can and cannot optimize.

    Thirdly when you do the work at compilation, you get multiple speed wins for the price of one. For instance this one also covers trieing the RE, which is something that run-time caching does not do.

    And so on.