gaal has asked for the wisdom of the Perl Monks concerning the following question:

I have an array of compiled regexes. What's the Best WayTM to compile one regex that's a disjunction of them all, that is, it matches if any of them match?

I know how to do all these:

# 1. my $any = qr/one|two|three/; # 2. my $one = qr/one/; my $two = qr/two/; my $three = qr/three/; my $any = qr/$one|$two|$three/; # 3. my @pats = qr/one/, qr/two/, qr/three/; my $any = qr/$pat[0]|$pat[1]|$pat[2]/; # ick

But I want:

my @pats = map qr/$_/, @some_input; my $any = ??? # no resort to explicit indices

Also, in my code the separate patterns are compiled much before the disjunction, so this is not what I'm looking for:

my $joined = join ")|(?:", @some_input; my $any = qr/(?:$joined)/;

Replies are listed 'Best First'.
Re: Programmatic regex disjunction
by Aristotle (Chancellor) on Nov 21, 2005 at 17:11 UTC

    Also, in my code the separate patterns are compiled much before the disjunction, so this is not what I’m looking for:

    But that is what you get. You cannot combine the compiled forms of precompiled regexen, they must be stringified for concatenation and the result has to be recompiled. There’s no way around that.

    So the last snippet is what you want. Sorry.

    Btw, the stringified regexen supply their own (?:) delimitation, so all you need to do is

    my $joined = do { local $_ = join '|', @any; qr/$_/; };

    As an aside, you may be interested in Regexp::Assemble.

    Update: added do as per gaal’s reply.

    Makeshifts last the longest.

      I didn't like the use of $_ to build a pattern, just as a matter of taste. I figure if you're going to localize a variable for this, make it the auto-joiner, taking advantage of the fact that regex context is double-quotish for variable interpolation (it would make sense to me if Perl magically set $" for array interpolation):
      my $joined = do { local $" = '|'; qr/@pats/; };
      though I'd likely do it like this:
      my $joined = qr/${\join '|', @pats}/;

      Caution: Contents may have been coded under pressure.
      You cannot combine the compiled forms of precompiled regexen, they must be stringified for concatenation and the result has to be recompiled. There’s no way around that.
      There is; that is exactly what (??{$someregex}) does.

        Ah! Of course.

        But a quick test shows me that that too does not propagate captures from the delayed patterns to the including pattern, which is why I memorised the rule as “combining precompiled patterns requires recompilation.”

        (Most of the time I build a pattern programmatically from a list involves matching one of a list of hash keys, some way or other, so capturing is always involved in some fashion.)

        Thanks for the correction.

        Makeshifts last the longest.

      Don't you need a do there?

      Nowadays = { ... } looks like a closure assignment to me because of Perl 6.

Re: Programmatic regex disjunction
by Tanktalus (Canon) on Nov 21, 2005 at 17:09 UTC

    Probably not what you're looking for, but I'd be tempted, without more context of where you're using this, to try:

    use List::Util qw(first); my $matched = first { $var =~ $_ } @pats;
    But that probably doesn't quite fill your requirement, I'm betting.

      That is in fact exactly what I'm doing now. :-)

      ++

Re: Programmatic regex disjunction
by Zaxo (Archbishop) on Nov 22, 2005 at 05:28 UTC

    This is my favorite way:

    my $any = do { local $" = '|'; qr/@pats/; };
    Some may call it obfuscated, but I think it's particularly clean.

    After Compline,
    Zaxo

Re: Programmatic regex disjunction
by diotalevi (Canon) on Nov 21, 2005 at 19:24 UTC
    Regexp::Optimize is meant for just this task except that it won't use the pre-compiled version of your regexp. It'll stringify them first, I think.
Re: Programmatic regex disjunction
by ysth (Canon) on Nov 22, 2005 at 04:31 UTC
    This should work:
    use re "eval"; my @pats = ( qr/one/, qr/two/, qr/three/ ); my $joined = "(??{" . join("})|(??{", map "\$pats[$_]", 0..$#pats) . " +})"; my $any = qr/$joined/;

      I really loathe the “join with right+left delimiter” idiom for paired delimiters. Since you don’t need a map for this, let me demonstrate that extreme to show why I don’t like it:

      my $joined = "(??{ \$pats[" . join( "] })|(??{ \$pats[", 0 .. $#pats ) + . "] })";

      Uhm, oh dear.

      Instead, I’d always prefer a map to put the delimiters in place before joining – it’s clearer, with less repetition of literals to boot. You’ve recognised the need for a map to clarify anyway – so take it to conclusion:

      my $joined = join '|', map "(??{ \$pats[$_] })", 0 .. $#pats;

      I think that looks a lot friendlier (than your code; to say nothing of my demonstration of atrocity).

      Makeshifts last the longest.

Re: Programmatic regex disjunction
by NetWallah (Canon) on Nov 22, 2005 at 01:51 UTC
    Would you consider using the any function in Quantum::Superpositions ?

    I have not used that in this context, but it may do what you need.

         "Man cannot live by bread alone...
             He'd better have some goat cheese and wine to go with it!"

      Probably not in Perl 5. But with Pugs I already can, and would.

      my @pats = /one/, /two/, /three/; say "matched" if $string ~~ any @pats;

      Then again, Perl 6 Rules have better composition, uh, rules, so there's likely a way to do it without a junction. One reason to insist is that where

      $string ~~ any @pats

      matches,

      my $any = any @pats; $string ~~ $any;

      does not. This means that I can't use a junction and a precompiled disjunctive pattern interchangably e.g. in a given block:

      given $string { when $any { ... } # won't work if $any is a junction }

      At least, that's how it operates now; I'd better ask on p6-l if that's the desired behavior. :-)

      Update: come to think of it, it's almost certainly a pugsbug.