perlancar has asked for the wisdom of the Perl Monks concerning the following question:

My development Perl is 5.18 (but I use perlbrew so I also have e.g. 5.10, 5.20). In one of my modules (DateTime::Format::Alami to be exact), I'm inserting the value of a regex during build time (to reduce startup overhead). The module is meant to use by 5.10+.

Since the Perl used for building is 5.18, the regexp printed contains the (?^...) sequence. As some of you already know, during Perl 5.14 there is a change in regexp stringification:

% perl -E'BEGIN { say $^V } say qr/a/i' v5.10.1 (?i-xsm:a) % perl -E'BEGIN { say $^V } say qr/a/i' v5.18.1 (?^ui:a)

Now, aside from using Perl 5.10 or 5.12 for building, or using Perl 5.10 or 5.12 just to print the regexp during building, are there other alternatives? I'm thinking perhaps there is a CPAN module that emulates stringifying regexp in the way of older Perls.

(Really, Perl is excellent with regards to backward compatibility, but this particular change is a sore pain.)

UPDATE: Sorry, ignore the above problem. I mistakenly thought that I am printing a Regexp object during build time, but actually I am just printing a string. Let me reiterate the question: is there a way (or is there a CPAN module) to make Perl 5.14+ strngify qr/a/i as:

(?i-xsm:a)

and not:

(?^ui:a)

so if I save the stringified regex into a source code in a file, e.g.:

$re = qr/(?i-xsm:a)/;

older Perls can execute the resulting source code?

UPDATE 2014-12-31: Turns out perl5140delta already gives details about this, including the solution to my problem. So I just need to use re's regexp_pattern to extract the pattern and the modifier separately, and just form the old (pre-5.14) stringification myself. SOLVED.

Replies are listed 'Best First'.
Re: Dumping regexp for Perl versions earlier than 5.14
by Eily (Monsignor) on Oct 03, 2014 at 12:22 UTC

    I feel like I'm missing something (well, I have zero experience in creating CPAN modules, so that may be where I'm wrong) but I think you may be trying too hard.

    First, I don't get how compiling the regexes at the program build-time is better. Perl being an interpreted language, there is a build-time each time you start a program, so doing something during build-time or at the beginning of run-time is equivalent time-wise. And even if you didn't use qred regexes, later version of perl have become quite good at guessing when not to recompile regexes.

    Second, and I already have a few answers in mind for this one, (since I already feel like I'm being wrong somewhere, I'll just keep going) if the stringification compiled regex doesn't suit your needs, why don't you dump the pattern before compilation (something like $pat = qq/REGEX/; say $pat; $reg = qr/$pat/i;)? The first of the answers that come to mind being "It's not just the pattern, but the modifiers that must be compared".

      Like you, I'm a bit confused about just what perlancar is doing (of course, no example code is given). In Perl as in life, there are a lot of ways to 'insert' things into other things, not all of which are appropriate in a given situation. I, also, don't understand why a regex cannot be defined as a simple string and/or qr-ed from the git-go.

      ... "It's not just the pattern, but the modifiers that must be compared".

      But modifiers can be included within a string or qr regex definition (and IMHO, should only be used in this way in the case of the  /i modifier).

      c:\@Work\Perl>perl -wMstrict -le "print $]; ;; my $rxs = '(?i) (?: a|b)'; my $rx = qr/$rxs/xms; print $rx; " 5.008009 (?msx-i:(?i) (?: a|b)) c:\@Work\Perl\monks\>perl -wMstrict -le "print $]; ;; my $rxs = '(?i) (?: a|b)'; my $rx = qr/$rxs/xms; print $rx; " 5.014004 (?^msx:(?i) (?: a|b))

        But modifiers can be included within a string or qr regex definition (and IMHO, should only be used in this way in the case of the /i modifier).
        Indeed they can, and that's actually what stringification of qred regexes does, I was thinking about context-dependant changes, like use re. You could still write the whole set of modifiers in the string to be sure to get what you want no matter what the context is though.

      In my module (DateTime::Format::Alami), I'm building a big regex from potentially hundreds of smaller bits (which will be retrieved from a method and applied some preprocessing). This regex won't change between runs, so in order to save startup time, I'm precomputing it during dist build time. (I don't know how big the saving will be though, so perhaps this is premature optimization.)

      There's another use-case where I put regex literal into the built source code with Dist::Zilla: inserting code-generated argument validation code. I do this in some cases where I want the autogenerated argument validation code to be embedded directly into the source code and not generated/called dynamically during runtime (and thus usually adds another subroutine call level or startup overhead time, because in my case the argument validation code generator has quite a bit of startup time). An example:

      sub foo { my ($arg1, $arg2) = @_; # INSERT CODE VALIDATION HERE ... }

      During build, a Dist::Zilla plugin will fill in the code validation routine:

      sub foo { my ($arg1, $arg2) = @_; defined($arg1) or die "arg1 is required"; $arg1 =~ qr/\A\w+\z/ or +die "arg1 must be alphanums only"; defined($arg2) or die "arg2 is req +uired"; # INSERT CODE VALIDATION HERE ... }

      I hope that clears up things a bit.

      And even if the above two use-cases are not commonly had with other programmers, I'm still curious about the problem of handling regex stringification backward-incompatibility in Perl 5.14.

      doing something during build-time or at the beginning of run-time is equivalent time-wise

      True, but the cost is moved to build-time and users are not bearing it :-) Suppose it takes you 0.50s to assemble a giant regex from smaller bits taken from a bunch of submodules. While the assembled regex literal only takes 0.01s for Perl to parse. Your module users will only take the 0.01s hit at startup time and not 0.50s.

      if the stringification compiled regex doesn't suit your needs, why don't you dump the pattern before compilation

      Yes, in this case I actually can, because I'm assembling the regex from strings. So before I turn it into a Regexp object, I can print the string first. My particular problem is solved. Thanks! :) But suppose I already have a regexp object? How do I stringify it under 5.14+ so it is compatible with 5.10 & 5.12?

Re: Dumping regexp for Perl versions earlier than 5.14
by AnomalousMonk (Archbishop) on Oct 03, 2014 at 14:48 UTC

    In a previous reply of mine to Eily, I pouted that you had given no example code in your OP to illustrate the problem you were asking about. I then realized that you had, in fact, given a link to the module in question, DateTime::Format::Alami, which might be considered a perfect example since it contains all the code!

    I've since taken a quick look at the module, and I must admit I still can't see how the change in regex stringification pre/post version 5.10 bears upon what you're trying to achieve. I see that you are building regexes from string fragments, but I don't see where and how you are 'inserting' a regex into another. Can you please clarify these points for me, perhaps with a short, runnable, standalone code example or two?

      OK, let's forget my original problem because it is already solved (Eily pointed out that before I convert a string into Regexp object, I can just print it first, and in my case it's true, all I have is a string). But sometimes I do have a Regexp object, for example in my other use-case the validation specification already contains a Regexp object, for example:

      $foo_spec = { args => { arg1 => {required=>1, match=>qr/\A\w+\z/}, arg2 => {required=>1}, }, };
      Now the validator code generator must generate this string:
      defined($arg1) or die "arg1 must be defined";
      $arg1 =~ qr/\A\w+\z/ or die "arg1 does not match regex";
      defined($arg2) or die "arg2 must be defined";
      

      The spec can be written by others.

      Of course, I can require the 'match' key to have a value of string and not Regexp object, so this avoids the problem.

      But I'm interested in knowing whether a newer Perl (5.14+) can stringify a Regexp object in a way that is compatible with older Perls.

      Actually, my original problem with DateTime::Format::Alami is not a problem at all, because in my module I never convert to Regexp object. Sorry about that!

      I have reiterated the problem in the first post.

      Correction, it's pre/post 5.14 (perl5140delta gives the details).

        I still don't understand your use-case well enough to see why the  "(?adlupimsx-imsx)" embedded pattern-match modifier would not be useful here (see Extended Patterns in perlre).


        Give a man a fish:  <%-(-(-(-<

Re: Dumping regexp for Perl versions earlier than 5.14
by LanX (Saint) on Oct 03, 2014 at 12:15 UTC
    I'm not sure if I understand completely but shouldn't it be possible to use the old strignification in newer Perls?

    I mean why don't you build with 5.10 and trust on backwards compatibility of 5.18 ?

    > Now, aside from using Perl 5.10 or 5.12 for building,

    Oops overlooked this or it was updated later...

    Never mind! :)

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)