morgon has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

assume I have a list of "simple enough" regexes (in the sense that they don't contain any backrefs, embedded code or the like).

Is it possible to decide (and if so what would be a good algorithm for that) that every possible string matches only against at most one regex in the list - or put another way that no possible string would match against 2 or more regexes of the list.

So I want an algorithm that takes a list of regexes and decides whether these regexes are mutual exclusive in the sense that no string would ever match against more than 1 regex of the list.

Any ideas?

Many thanks!

Replies are listed 'Best First'.
Re: mutual exclusion of regexes
by blokhead (Monsignor) on Apr 14, 2009 at 00:02 UTC
    Yes, it is certainly possible to decide this property. Put another way, given two regular expressions, you want to know whether their intersection is empty.

    From a theoretical point of view, it is no problem to compute the intersection of two regular expressions. From a practical point of view, it is a bit of a pain, since you must convert to DFAs somewhere along the way. NFAs and regular expressions aren't immediately amenable to the intersection operation. Keep in mind that these kinds of problems (intersecting and/or complementing languages represented as regular expressions) are usually PSPACE-hard, which is even worse than NP-hard.

    If you don't want to code up algorithms for regular expression → DFA conversion and DFA intersection, you might want to check out this suite that does these kinds of manipulations of regular expressions. The authors (myself and another monk) have not worked on it for quite some time, but I think that this simple kind of functionality is in the code. I believe that something like this would work:

    use FLAT; my $r1 = FLAT::Regex->new( $expr1 ); my $r2 = FLAT::Regex->new( $expr2 ); print $r1->intersect($r2)->is_empty ? "disjoint" : "overlapping";
    Extending this to a large number of regular expressions means that you must just check the intersection of all pairs of regexes.

    Update: some related reading: Comparative satisfiability of regexps.; Testing regex equivalence; Negating Regexes: Tips, Tools, And Tricks Of The Trade.

    blokhead

Re: mutual exclusion of regexes
by Bloodnok (Vicar) on Apr 14, 2009 at 11:22 UTC
    Using a hash keyed on the regex e.g. untested...
    my @regexes = qw/re_1 re_2 re_3 re_4 .../; my %result = (); foreach my $re (@regexes) { $result{$re} += $string =~ /$re/; } # Save the number of matched regexes my $count = grep defined $_ && $_ > 1, values %result;
    A user level that continues to overstate my experience :-))