Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Compare two regex patterns

by moritz (Cardinal)
on Mar 06, 2008 at 07:48 UTC ( [id://672388]=note: print w/replies, xml ) Need Help??


in reply to Compare two regex patterns

You could try to compile both regexes with use re 'debug'; and compare those outputs:
perl -c -Mre=debug -e 'm/123*/' Freeing REx: `","' Compiling REx `123*' size 6 Got 52 bytes for offset annotations. first at 1 rarest char 2 at 1 1: EXACT <12>(3) 3: STAR(6) 4: EXACT <3>(0) 6: END(0) ...

The indented part is the interesting one. If two regexes show the same output in the indented part, they will match the same pattern (the reversion isn't always true, because (a|b) and (b|a) behave the same in some regexes, and behave differently in others (not quite sure, but you can construct cases with alternation where the order somtimes matter, and somtimes not)).

I don't know if you can access this information from inside perl, perhaps with one of the B:: modules, if not you can still use backticks or qx to execute a command line like the one I showed.

Replies are listed 'Best First'.
Re^2: Compare two regex patterns
by vbar (Novice) on Mar 06, 2008 at 10:00 UTC
    My Regexp::Compare does that - not for all regexps (that would solve the halting problem :-) ), but "1233*" is recognized as more specific than "123*". On the other hand, the implementation is quite dependent on Perl internals and currently broken for perl 5.10 - the public demand didn't justify an upgrade yet...
      not for all regexps (that would solve the halting problem :-) ),

      To digress a bit on theory: for regular expressions that isn't hard at all.

      Once you have a DFA for your regex, you can build the minimal DFA (that can be done in O(n²) or O(n³), not sure...). You can do that for both regexes you compare, and if the DFAs are isomorphic, both regexes are equivalent.

      Note that checking if two DFAs are isomorphic isn't hard either, because you know the start state.

      But of course Perl's regexps are not regular (in the CS sense), so you can forget everything I said if you're only after a practical solution.

        you can build the minimal DFA (that can be done in O(n²) or O(n³), not sure...)

        Actually that can be done in O(n lg n) using Hopcroft's Algorithm, but it's not a well known or particularly easy algorithm. See Hopcroft's 1971 paper, though even the academics find it a bit dense, which led Gries to write a paper called Describing an algorithm by Hopcroft a year later. Unfortunately the latter is not freely available online.

        All this is surely overboard for the poster's original question, I just wanted to share this tidbit since it's not easy to come by.

        Well, as ikegami says above, OP didn't ask for regexp equivalence, but for partial ordering, which I personally found quite hard enough :-) - but perhaps that's because I don't know the theory... How would you compare minimal DFAs to find (for example) that matching "aa*b" implies matching "ab"?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://672388]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-19 09:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found