http://qs1969.pair.com?node_id=11124218

MikeTaylor has asked for the wisdom of the Perl Monks concerning the following question:

I have a string $value that I want to transform by regexp substitution. But the pattern, replacement and flags are known only at run-time. They are specified by $pattern, $replacement and $flags. So I want to do something like
$value =~ s/$pattern/$replacement/$flags;
but of course that doesn't work as intended. Or perhaps something like this, if I could only find the right class name:
my $re = new Regexp($pattern); my $value = $re->substitute($value, replacement, $flags);
There has to be way to do this ... right?

Replies are listed 'Best First'.
Re: Regexp substitution using variables
by choroba (Cardinal) on Nov 25, 2020 at 19:56 UTC
    Some of the flags can be moved to a non-capturing group:
    #!/usr/bin/perl use warnings; use strict; my $string = 'abc'; my $pattern = 'B'; my $replacement = 'X'; my $flags = 'i'; $string =~ s/(?$flags:$pattern)/$replacement/; print $string; # aXc

    But you can't do that for /gore.

    Update: Even string eval doesn't help, as plain interpolation of the $replacement can break if it contains a slash.

    eval "s/\$pattern/\$replacement/$flags"
    doesn't work either, as you can't put $1 into $replacement unless you always use /ee which makes it unsafe again.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thank you to all of you who have suggested the form s/(?$flags:$pattern)/$replacement/. I think this will get me much of what I need. The fact that the global-replace flag "g" doesn't work in this position is an annoying wrinkle, but I am going to take a deep breath and code up the with-g and without-g cases separately, depending on whether or not $flags =~ s/g// succeeds.
        > with-g and without-g cases separately, depending on whether or not $flags =~ s/g// succeeds.

        That's reasonable, because /g is not a simple modifier changing the match-rules, it turns the "replace" into a different "replace_all" command with very different behavior.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        but I am going to take a deep breath and code up the with-g and without-g cases separately
        What is the obsession of people to try and solve a complex problem like this in a single line of code, just because it can be done in one line of code in a perl script? I don't mean you specifically, but in general, like apparently most people who replied to this thread.

        Splitting this up in two parts makes sense, using /g is not a modifier of the pattern (as it is in several other languages), but of the substitution. Something like this looks acceptable to me as the (obvious) redundancy is actually quite limited:

        if($flages =~ s/g//) { s/(?:$flags)$pattern/replacement($replacement)/ge; } else { s/(?:$flags)$pattern/replacement($replacement)/e; }
        where you still have to provide the sub replacement.

        Other flags cannot really coded this way, but there's no need to provide for /o or /r at all, and allowing people to use /e flag in a config file, simply looks dangerous to me. If people would really want to use /e, it likely would be for just a handful of specific cases, and you can instead code a simpler solution for those cases (for the user, not necessarily for you) explicitly in your script, than having them write convoluted perl code.

        That real danger of allowing ordinary users to run arbitrary code, is also why I really don't like use of eval. It also enforces taking special care to be taken when writing the sub replacement. You can mitigate the danger by using a module like String::Interpolate, to embed captured values while disallowing access to the rest of the intestines of the script. .

        Well, this gets me much of what I need ... but I can't get back-references, either with $1 or \1. In either case, they appear as literals. Any ideas, other than eval?

        depending on whether or not $flags =~ s/g// succeeds

        Testing $flags =~ /g/; is simpler.

      Even string eval doesn't help, as plain interpolation of the $replacement can break if it contains a slash

      My first thought was to use eval but I hit a brick wall when I tried...
      I was thinking of first changing a slash in $replacement for a double slash then using eval to do the substitution but I got stuck getting the result of the substitution.

      $replacement =~ s/\\/\\\\/g; $string =~ eval "s/\$pattern/\$replacement/$flags";
      But that doesn't do it...

Re: Regexp substitution using variables
by jwkrahn (Abbot) on Nov 25, 2020 at 19:49 UTC

    For certain flags you can do:

    $value =~ s/(?$flags:$pattern)/$replacement/;

      This answer and other answers have suggested:

      $value =~ s/(?$flags:$pattern)/$replacement/;
      In my answer I used a subtle variation:
      $value =~ s/(?$flags)$pattern/$replacement/;
      which worked as expected in my test code. So I went off to the documentation and sure enough it shows both but it does not (at least to my eyes) show what the difference is between them. Can anyone explain if there is a difference and when it practically matters? It doesn't seem to matter here.

      On a different note - is it preferred by other Monks that questions like this get asked in the thread or is the preference for them to have their own new thread?,

        > does not (at least to my eyes) show what the difference is between them.

        you are comparing

        in your example there is no difference, but in the second approach with pattern the reach of modifiers is limited to the group.

        DB<24> p 'xX' =~ /(?i:X)X/ 1 DB<25> p 'xX' =~ /(?i:X)x/ DB<26> p 'xX' =~ /(?i)Xx/ 1 DB<27>

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Regexp substitution using variables
by kcott (Archbishop) on Nov 25, 2020 at 22:11 UTC

    G'day MikeTaylor,

    Or perhaps something like this, if I could only find the right class name:

    my $re = new Regexp($pattern); my $value = $re->substitute($value, replacement, $flags);

    About 15 or 20 years ago, I read a book by Damian Conway called "Object Oriented Perl". In it, he shows the creation of blessed objects using various things including regular expressions: your post reminded me of this.

    I don't own the book. If you can get a copy (you possibly already own one) it's certainly worth reading even though it's now quite old. If you follow the link I provided, you'll see a free PDF copy is offered; however, it looks like you need to "add to cart" which probably also means you have to "create an account" — I didn't follow through on this.

    Here's a very quick-and-dirty implementation of a class which blesses regular expressions.

    package Regex; use strict; use warnings; sub new { my ($class, $pattern, $flags) = @_; my $flag_part = defined $flags ? "(?$flags)" : ''; my $re_part = "\Q$pattern"; return bless qr{$flag_part$re_part}, $class; } sub match { my ($self, $str) = @_; return $str =~ $self ? 'YES' : 'NO'; } sub replace { my ($self, $str, $new) = @_; $str =~ s/$self/$new/; return $str; } 1;

    If you want to use something like this in production code, it'll need a lot more work. What I've provided is only intended to demonstrate the basic principles involved. The book would probably provide a lot more information; but I don't remember details of something I read about two decades ago.

    Here's a test of that module:

    #!/usr/bin/env perl use strict; use warnings; use FindBin; use lib "$FindBin::Bin/../lib"; use Regex; my $pat = 'b'; my $case_sens_re_obj = Regex::->new($pat); my $case_insens_re_obj = Regex::->new($pat, 'i'); my $test_string = 'ABC'; print 'case_sens_re_obj match: ', $case_sens_re_obj->match($test_string), "\n"; print 'case_insens_re_obj match: ', $case_insens_re_obj->match($test_string), "\n"; print 'case_sens_re_obj substition: ', $case_sens_re_obj->replace($test_string, '_'), "\n"; print 'case_insens_re_obj substition: ', $case_insens_re_obj->replace($test_string, '_'), "\n";

    Output:

    case_sens_re_obj match: NO case_insens_re_obj match: YES case_sens_re_obj substition: ABC case_insens_re_obj substition: A_C

    Unrelated but important: Please avoid indirect object syntax; e.g. new Regexp($pattern). See "perlobj: Indirect Object Syntax" for a discussion of problems with this syntax. The above example would be much better as Regexp::->new($pattern) — "perlobj: Invoking Class Methods" explains that.

    P.S. When checking links prior to posting, I noticed "PDF for FREE" has been replaced with the text, "pBook + PDF". I don't know what that means and whether the PDF is still free or not (there would have only been a matter of minutes between finding the link in the first place and checking I had correctly included it in my post).

    — Ken

      Thanks, Ken, this is helpful. Point taken on Class->new, too.

      The problem with this class, like the solutions above that use s/(?$flags:$pattern)/$replacement/ directly, is that it doesn't handle back-references. (That's true even with the \Q removed from the definition of $re_part in the class constructor.

Re: Regexp substitution using variables
by Fletch (Bishop) on Nov 25, 2020 at 20:38 UTC

    I'm trying to think of some application where you'd reasonably need to accommodate random substitutions with possible /g modifiers but I'm coming up blank (but probably need more caffeine to boot . . .). I started to post something mentioning string eval (which as has been pointed out isn't the answer there either) but something about the original question has a not-too-faint whiff of "XY problem" about it.

    Could you step back a hair more and explain why you think you need to run substitutions with arbitrary modifier flags? It may be that you don't actually and you could really get by with one of the prior suggestions (like moving compatible flags onto the front of the pattern). Or maybe you could work with some sort of (handwaving vigorously here) plugin / module system where you write substitution classes which implement a specific role that . . . /shrug

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      I understand your scepticism; this does indeed feel like one of those "How do I do X?" questions where the answer "Don't do X, do Y instead". (Is that what you meant by an "XY problem"? My situation is basically that I need to run a config file that specifies regular-expression substitutions. Specifically, my program is generating USMARC-format bibliographic records, and a config file says things like "in the 245$a field, replace /foo/ with 'bar' globally". In fact, the config looks like this:
      "245$a": [ { "op": "regsub", "from": "foo", "to": "bar", "flags": "g" } ]
      If you can think of a better way to do this, I am all ears — but bear in mind I do need the full power of regexp substitutions, e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part.

        This is interesting. Can you provide some additional examples, including more esoteric ones, and possible a little sample text? I was just wanting to look at the challenges you're facing more pragmatically. Test cases would be fantastic.


        Dave

        "245$a": [ { "op": "regsub", "from": "foo", "to": "bar", "flags": "g" } ]

        This seems like a good starting point. See neilwatson's article How to ask better questions using Test::More and sample data for the way forward. Once you have a few working test cases defined, the only thing left is to define about a million more, including generous edge and corner cases and exception cases! No problem. :)


        Give a man a fish:  <%-{-{-{-<

        > ... e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part.

        Honestly .... store the full real regexp in your config and eval it (or eval it into a sub to optimize execution time)

        "245$a": [ { "regexp": 's/(foo|bar)/He said "$1"/' } ]

        There is no way to "safely" abstract the capture-var away, it has to be compiled into the regex and this needs an eval or /ee with all connected security issues.

        > but bear in mind I do need the full power of regexp substitutions,

        I have the impression your JSON format is an attempt to make it language agnostic. But the "full power" means you will be stuck with Perl.

        And full power means that security becomes an illusion.

        DB<111> $_="abc" DB<112> s/(.)/@{[print "what? --> $1\n"]}/g what? --> a what? --> b what? --> c DB<113>

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Regexp substitution using variables
by Bod (Parson) on Nov 25, 2020 at 19:53 UTC

    You need to add the modifiers at the start of the substitution like this:

    use strict; my $pattern = 'test'; my $replacement = 'New'; my $flags = 'i'; my $value = 'My Test Text'; $value =~ s/(?$flags)$pattern/$replacement/; print "$value\n";
    This will print:
    My New Test

Re: Regexp substitution using variables
by MikeTaylor (Acolyte) on Nov 25, 2020 at 23:33 UTC
    Here is what I am doing at the moment:
    $replacement =~ s/\\/\\\\/g; eval "\$res =~ s/$pattern/$replacement/$flags";
    It's working, and crucially supports back-references — unlike the $res =~ s/(?$flags:$pattern)/$replacement/ solution.

    Of course, the use of eval gives me the heebie-jeebies; but I'm not going to lose too much sleep as we already need to trust the people who write the configuration files that will contain the values used in the eval.

      Partly in answer to choroba's challenge, here's an approach that works with forward/backslashes, escape sequences and capture variables in replacement strings. Whether it will answer your needs is another question. A fixup step for forward slashes is necessary. Works under Perl versions 5.8.9 and 5.30.3.

      Win8 Strawberry 5.8.9.5 (32) Thu 11/26/2020 4:08:05 C:\@Work\Perl\monks >perl use strict; use warnings; my $pattern = '(\\\\tEs/Ti//N\x67\\\)'; my $replacement = '\\\Fr/es//h\\\\ \U$1'; my $flags = 'i'; # $got_g is true if /g modifier present in flags. # ($flags, my $got_g) = sanitize_flags_detect_g($flags); fixup_forward_slashes($pattern, $replacement); my $value = 'My \Tes/ti//ng\ Text'; print "replacement '$replacement' \n"; my $eval_string = "\$value =~ s/$pattern/$replacement/$flags"; print "eval_string '$eval_string' \n"; eval $eval_string; print "eval err '$@' \n"; print "output '$value' \n"; sub fixup_forward_slashes { s{/}'\/'g for @_; } ^Z replacement '\\Fr\/es\/\/h\\ \U$1' eval_string '$value =~ s/(\\tEs\/Ti\/\/N\x67\\)/\\Fr\/es\/\/h\\ \U$1/i +' eval err '' output 'My \Fr/es//h\ \TES/TI//NG\ Text'
      It's awkward that a \ single literal backslash in the input/output string must be represented by a \\ double backslash in the substitution and by \\\ triple or \\\\\ quadruple backslashes in the single-quoted pattern/replacement strings, but that's single/double-quotish backslash handling for ya. If the pattern/replacement strings were taken from a file, it would be possible to just use double backslashes.


      Give a man a fish:  <%-{-{-{-<

      > It's working

      OK, now try to include a slash into the pattern or replacement.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      we already need to trust the people who write the configuration files that will contain the values used in the eval.

      Beware Hanlon's Razor. Are you trusting them to be competent or just benign? In your shoes I would be untainting their input very conservatively.


      🦛

      It's working, and crucially supports back-references

      That's interesting as I cannot get this to support back-references...this is very similar to my initial attempt. So I have attempted to replicate it:

      use strict; my $pattern = '(testing)'; my $replacement = 'New \1'; my $flags = 'i'; my $value = 'My Testing Text'; $replacement =~ s/\\/\\\\/g; eval "\$value =~ s/$pattern/$replacement/$flags"; print "$value\n";
      This prints
      My New \1 Text
      It doesn't substitute the capture.

        The OPed sort of problem is tricky, but for this specific iteration:

        Win8 Strawberry 5.8.9.5 (32) Wed 11/25/2020 22:12:13 C:\@Work\Perl\monks >perl use strict; use warnings; my $pattern = '(testing)'; my $replacement = 'New \U$1'; my $flags = 'i'; my $value = 'My Testing Text'; ### $replacement =~ s/\\/\\\\/g; print "replacement '$replacement' \n"; eval "\$value =~ s/$pattern/$replacement/$flags"; print "$value\n"; ^Z replacement 'New \U$1' My New TESTING Text
        (An escaped backreference \1 is not kosher in a replacement string anyway; it should be in $1 form.)

        Update: Here's a version of the example code that better illustrates the process of building the evaluation string:

        Win8 Strawberry 5.8.9.5 (32) Wed 11/25/2020 22:45:44 C:\@Work\Perl\monks >perl use strict; use warnings; my $pattern = '(testing)'; my $replacement = 'New \U$1'; my $flags = 'i'; my $value = 'My Testing Text'; print "replacement '$replacement' \n"; my $eval_string = "\$value =~ s/$pattern/$replacement/$flags"; print "eval_string '$eval_string' \n"; eval $eval_string; print "$value\n"; ^Z replacement 'New \U$1' eval_string '$value =~ s/(testing)/New \U$1/i' My New TESTING Text


        Give a man a fish:  <%-{-{-{-<

Re: Regexp substitution using variables
by BillKSmith (Monsignor) on Nov 26, 2020 at 15:36 UTC
    Here is a solution using eval. Some care is required in using escapes. It works with or without the OO interface.
    use strict; use warnings; use Test::More tests => 2; my $pattern = '\Aabc\/'; my $replacement = '123\/'; my $flags = 'i'; my $value = 'ABC/def'; my $expected = '123/def'; my $command = "\$value =~ s/$pattern/$replacement/$flags"; diag $command; eval $command; ok( $value eq $expected, 'use eval directly' ); $value = 'ABC/def'; my $re = new Regexp($pattern); $value = $re->substitute( $value, $replacement, $flags ); ok( $value eq $expected, 'use eval in class' ); package Regexp; sub new { my ( $class, $pattern ) = @_; my $new_object = bless \$pattern, $class; return $new_object; } sub substitute { my ( $self, $value, $replacement, $flags ) = @_; my $pattern = $$self; my $command = "\$value =~ s/$pattern/$replacement/$flags"; main::diag $command; eval $command; return $value; }

    OUTPUT:

    1..2 # $value =~ s/\Aabc\//123\//i ok 1 - use eval directly # $value =~ s/\Aabc\//123\//i ok 2 - use eval in class
    Bill