in reply to Tidying and simplifying a regular expression

One reason (maybe the only reason) that the stringization of a  qr// object comes wrapped in its own little non-capturing group is so that the further interpolation of something like

my $rx = qr{ ... }xms; my $ry = qr{ ... }xms; my $rz = qr{ ... }xms; if ($string =~ m{ \A $rx* $ry+ $rz{2,5} \z }xms) { ... }
can work intuitively — even the  $rz{2,5} bit, surprisingly, although you can only push that one so far.

This node notwithstanding, what would one gain in the long run from a "simplified" form for the example given? How often does one build a regex in this way and then try to read it?


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Tidying and simplifying a regular expression (flags)
by LanX (Saint) on Dec 10, 2017 at 02:13 UTC
    > One reason (maybe the only reason) that the stringization of a  qr//  object comes wrapped in its own little non-capturing group is so that the further interpolation of something like

    > ( examples with appended quantifiers )

    Not really.

    perlre is actually quite explicit about the why

    > > > The caret tells Perl that this cluster doesn't inherit the flags of any surrounding pattern, but uses the system defaults (d-imnsx ), modified by any flags specified.

    In other words: It's about preserving the flags of the embedded regex and assuming default if none are specified.

    Update demonstration

    DB<7> $U=qr/U/ # always upper case DB<8> $i=qr/i${U}i/i # surrounding case insensitive DB<9> p $i (?^ui:i(?^u:U)i) DB<10> p 'iui' =~ $i DB<11> p 'iUi' =~ $i 1 DB<12> p 'IUI' =~ $i 1 DB<13> p 'IuI' =~ $i DB<14> p join "\n", grep { $_ =~ $i } <{i,I}{u,U}{i,I}> iUi iUI IUi IUI

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

      It's about preserving the flags of the embedded regex ...

      Yes, and the reason that is done is, at least in part, to make composition of relatively more complex regexes from simpler  qr// components (via interpolation) work "right."


      Give a man a fish:  <%-{-{-{-<

        > composition of relatively more complex regexes from simpler  qr//  components (via interpolation)

        The "interpolation" part is surprising and irritating me here.

        I was somehow expecting that an already compiled simpler regex doesn't need to be stringified and interpolated again.

        But this approach is surely easier to achieve and most probably more robust.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

Re^2: Tidying and simplifying a regular expression
by Dallaylaen (Chaplain) on Dec 09, 2017 at 07:48 UTC
    This node notwithstanding, what would one gain in the long run from a "simplified" form for the example given? How often does one build a regex in this way and then try to read it?

    Not exactly a common use case, or that should've been built in... I can come up with two examples:

    • Defining a regular expression constant as a combination of smaller constants;
    • Compiling a regex from user data and caching it somewhere.

    In both cases trying to print the resulting expression for debugging leads to something incomprehensible that itself needs to be debugged.

      > In both cases trying to print the resulting expression for debugging leads to something incomprehensible that itself needs to be debugged.

      It's the other way round, you can trace the past qr compilation steps, which actually helps debugging.

      You are complaining about the verbosity of debugging informations, but you have to admit that your example is a very constructed edge case.

      AnomalousMonk is right to ask for common cases where this becomes a problem.

      I can see the point for a regex::tidy but this alone is not a very convincing incentive.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Wikisyntax for the Monastery