in reply to Re: Comparative satisfiability of regexps.
in thread Comparative satisfiability of regexps.

Actually Friedl is making a point about how the engine would match, not that the state tables involved would be the same. IOW there is no guarantee that a DFA constructed from abc would have the same state transition tables as from [aa-a](b|b{1}|b)c.

Instead of trying to build a DFA regex engine (a non trivial task) I'd suggest just writing some code that can normalize two regexes to the same representational format. If you can normalize two regexes to the same pattern then they are the same. Of course this doesnt help when it turns out that the receivers regex can accept a superset of the emitters pattern generator. Ie: the emitting regex could be /a/ and the receiving regex could be /a|b/ which means the receiving regex can handle anything that the emitting regex can produce but vice versa is not true. IMO if you figure that problem out you are well on your way to solving the halting problem.

---
demerphq

Replies are listed 'Best First'.
Re^3: Comparative satisfiability of regexps.
by tall_man (Parson) on Jan 19, 2005 at 22:46 UTC
    It is true that the two DFAs may not be exactly alike. However, any two DFAs can be proved equivalent (or not) in finite time, according to this article (in PowerPoint format).

    Of course, "finite" is not the same thing as "practical". The brute-force solution mentioned in that article is exponential (test all strings of length equal to the number of states). There is a second method that XORs the two DFAs that looks more promising.

    It looks like either method could be adapted to test whether one DFA accepts a subset of the other.

    Update: blokhead below gives the same solution in a more thoroughly-explained way.

Halting problem? Sheesh, I hope not.
by Meowse (Beadle) on Jan 19, 2005 at 21:45 UTC
    Any technology provably equivalent to a perpetual motion machine is junk. :-)

    Actually, what I'm trying to resolve is the "receivers regex can accept a superset of the emitters pattern generator" question. Specifically, whether that is equivalent to the halting problem or not.

    As I said in another note, I believe that DFAs require a memory strip to be Turing-complete, and thus analysis of regexps doesn't run into the halting problem. But I'm not certain of that, and my last Comp. Sci. course was very long ago.

    Resolving to canonical form is a good first step, but it's not entirely clear how to algorithmically determine that, for example, aaaab?aaaa is a subset of aaaaab*aaaaa. That one took a moment to prove and double-check in my head, and it's a *really* trivial case compared to some I'd like to verify.

    Brain-achingly,

    Mickey.

      Well if it isnt the the halting problem exactly i suspect its damn near as hard. How about these two:

      /aaaa(bc|bd|be|(b|b)(a(a(a(a)))|baba|aaaa)/ /aaaab?aaaa/

      I suspects its a lot easier to show that regex E can produce something that regex R wont accept, but saying for sure that R will accept all patterns producable by E isnt so easy. And IMO neither are approachable problems at all. Id be thinking of alternate solutions. One possibility is to produce something that generates random data from the E pattern and then hammer the R pattern with the products to see but obviously this wont be a proof, just a strong argument.

      /me wonders if he still has his data from regex code...

      ---
      demerphq