Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re^2: Compare two regex patterns

by vbar (Novice)
on Mar 06, 2008 at 10:00 UTC ( #672418=note: print w/replies, xml ) Need Help??

in reply to Re: Compare two regex patterns
in thread Compare two regex patterns

My Regexp::Compare does that - not for all regexps (that would solve the halting problem :-) ), but "1233*" is recognized as more specific than "123*". On the other hand, the implementation is quite dependent on Perl internals and currently broken for perl 5.10 - the public demand didn't justify an upgrade yet...

Replies are listed 'Best First'.
Re^3: Compare two regex patterns
by moritz (Cardinal) on Mar 06, 2008 at 10:22 UTC
    not for all regexps (that would solve the halting problem :-) ),

    To digress a bit on theory: for regular expressions that isn't hard at all.

    Once you have a DFA for your regex, you can build the minimal DFA (that can be done in O(n) or O(n), not sure...). You can do that for both regexes you compare, and if the DFAs are isomorphic, both regexes are equivalent.

    Note that checking if two DFAs are isomorphic isn't hard either, because you know the start state.

    But of course Perl's regexps are not regular (in the CS sense), so you can forget everything I said if you're only after a practical solution.

      Well, as ikegami says above, OP didn't ask for regexp equivalence, but for partial ordering, which I personally found quite hard enough :-) - but perhaps that's because I don't know the theory... How would you compare minimal DFAs to find (for example) that matching "aa*b" implies matching "ab"?
        Here's how you would do this. Forget minimization, you don't need it. Take NFAs (or DFAs, it doesn't matter) representing all the regexes you want to compare. Mark the final state(s) of each NFA or DFA with something that identifies that particular regex. Then combine them all into a single NFA using alternation. Finally, convert that NFA into a DFA, but when you do, make sure that each final state of the DFA has markers indicating what regex(ex) it was final for.

        Now survey all your final states. If all final states have markers for all the regexes, the regexes are equal. You can also use this to determine if one is a subset of another (all final states have markers for regex1, but only some have markers for regex2), if they overlap (at least one final state had markers for both regexes), or if they are completely disjoint (no final state had markers for both).

      you can build the minimal DFA (that can be done in O(n) or O(n), not sure...)

      Actually that can be done in O(n lg n) using Hopcroft's Algorithm, but it's not a well known or particularly easy algorithm. See Hopcroft's 1971 paper, though even the academics find it a bit dense, which led Gries to write a paper called Describing an algorithm by Hopcroft a year later. Unfortunately the latter is not freely available online.

      All this is surely overboard for the poster's original question, I just wanted to share this tidbit since it's not easy to come by.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://672418]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2022-08-17 10:49 GMT
Find Nodes?
    Voting Booth?

    No recent polls found