John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I want to add optional stuff to a compiled regex based on an option. So I'll have $r= qr/this-stuff | more-stuff/x; where more-stuff may or may not be present.

If I write $r= qr/this-stuff $more/x where $more may be empty or contain "| more-stuff", then $more is a string not a regex. I suppose I could, but I'd have to be more careful about piecing together a string first that will then be compiled. If I had $r= qr/this-stuff | $more/x; and made $more another qr like I want, then I can't get rid of the | in the main level. My current thought is to make $more point to either the optional stuff OR in the off mode contain something that always fails. Seems awkward though.

Any other suggestions?

Replies are listed 'Best First'.
Re: Composing regex's dynamically
by Corion (Patriarch) on Apr 27, 2011 at 12:37 UTC

    An alternative would be to construct the final regex from the parts:

    my @parts = qr/this-stuff/; if ($foo_needed) { push @parts, qr/more-stuff/i; }; my $r = join "|", @parts; $r = qr/$r/; # just to be explicit about what the string is supposed t +o contain
      You are joining strings, yet @parts contains compiled re's. Wouldn't that join the stringified forms of the regex's, not the compile stuff itself?

      But I see the general point: Given $r1 and maybe $r2 as compiled regexs, I could say $r= qr/$r1|$r2/ Or if absent say $r=$r1;.

      I'm finding that I'm limited in how I break up a huge regex into separate pieces. Pieces can contain named captures and MARK:names, and the assembled result works the same as if it were written as one big string. But a named recursive reference (?&name) must be present when the regex is compiled, not (just) when it is used later.

        Regexes and strings are interchangeable. "Joining" two regexes (by interpolating them into a third) will join their stringified versions.

        If you want to "compile" a string to a regex, use qr() on the final string, like I did.

Re: Composing regex's dynamically
by RMGir (Prior) on Apr 27, 2011 at 12:54 UTC
    I was going to suggest Regex::PreSuf, but when searching for that, I found Regexp::Assemble. That seems like exactly what you're looking for...

    (Although Corion's suggestion should work just fine, too, if you're just alternating strings...)


    Mike
Re: Composing regex's dynamically
by JavaFan (Canon) on Apr 28, 2011 at 15:13 UTC
    $r = qr/this-stuff/x; $r = qr/this-stuff $more/x if $more;
    where $more is either "| more-stuff", or "(*FAIL)". It doesn't matter whether $more is a plain string, or a $qr construct.

    Now, I often assemble regexes from subparts. And I strongly prefer the subparts to be strings over compiled regexes. The compiled regexes contain extra sets of parens and setting of modifiers. I've made regexes long enough where the additional overhead of having all your subparts be qr// constructs made the difference between slow and "just takes too long".

    You need a few backslashes less when using qr// instead of qq//. And there are some edge cases where the heuristic parsing of quoted constructs decides differently but they're obscure enough I can't even remember them.

      Interesting. Does the wrapping of a compiled regex with the remembered modifiers cost more than adding plain non-capturing parens that you would need anyway? Or in your situation did you not need parens at all?

      I wonder if composing actual pre-compiled regex's is any better than converting to string and re-compiling?

        I've never researched that, but if I make building blocks to assemble regexes, I typically surround that with (?:), unless it's not going to matter. So, I'd write:
        my $bblock1 = "[a-f]"; my $bblock2 = "foo"; my $bblock3 = "(?:$bar|$baz)";
        And then there's:
        my $bblock4 = "[a-i]"; my $bblock5 = "[A-I]"; my $bblock6 = "[0-9]"; my $bblock7 = "$bblock4|$bblock5|$bblock6"; $bblock7 =~ s/\Q]|[\E//g; # Make one range