harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Fellow Monks, I have a problem in this script with the regex replace. It won't simply replace the string correctly. It will add extra / signs in front and after the replace. Why is that??
#!/usr/bin/perl -w use strict; use warnings; my $S = 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK'; $S = Replace($S, ' A', '-@'); print "\n$S"; exit; ################################################## # v2019.12.6 # This function scans string S and replaces the # first N occurrences of string A with string B # and returns a new string. If N is -1 then only # the last instance is replaced. # Usage: STRING = Replace(STRING_S, STRING_A, [STRING_B, [N]]) # sub Replace { # First, we make sure that required arguments are available # and any special scenarios are handled correctly. defined $_[0] or return ''; defined $_[1] or return $_[0]; my $B = defined $_[2] ? $_[2] : ''; my $N = defined $_[3] ? $_[3] : 0x7FFFFFFF; my ($LA, $LB) = (length($_[1]), length($B)); ($N && $LA && $LA <= length($_[0])) or return $_[0]; my ($LAST, $F, $X) = (0, 0, $_[0]); if ($N > 0x7FFFFFFE) { # If we get here, it means that N was omitted and # so there's no restriction on how many times we # have to replace. We just replace all of them, # and it doesn't matter if we replace from right # to left or from left to right. my $A = $_[1]; # $X =~ s/\Q$A\E/\Q$B\E/g; # THIS IS NOT RIGHT $X =~ s/\Q$A\E/$B/g; # THIS IS CORRECT. # thank you, Corion. } elsif ($N < 0) { # If we get here, we must not replace every # instance, and we must search from right to left. $F = length($X); while (($F = rindex($X, $_[1], $F)) >= 0) { substr($X, $F, $LA) = $B; ++$N or last; } } elsif ($LA == $LB) { # If we get here, we must replace only N number # of instances, and the output string will be the # same length as the input string. # (Search from left to right.) while (($F = index($X, $_[1], $F)) >= 0) { substr($X, $F, $LA) = $B; $F += $LB; --$N or last; } } else { # In this case, output string will NOT be the # same length as the input string. # We must replace only N number of instances, # and we go from left to right. $X = ''; while (($F = index($_[0], $_[1], $F)) >= 0) { $X .= substr($_[0], $LAST, $F - $LAST); $X .= $B; $F += $LA; $LAST = $F; --$N or last; } $X .= substr($_[0], $LAST); } return $X; }

Replies are listed 'Best First'.
Re: 1000000th question about regex lol
by Corion (Patriarch) on Dec 07, 2019 at 07:09 UTC

    No. Your program does not add / signs.

    It adds backslashes (\ "signs"), and this is unsurprising:

    $X =~ s/\Q$A\E/\Q$B\E/g; # THIS IS NOT RIGHT

    On the replacement side of s///, \Q and \E apply quotemeta to the replacement string.

    You want quotemeta only on the regex side of s/// to escape all regex meta characters, and usually never on the replacement side.

    If you leave them out, you get the plain replacement:

    $X =~ s/\Q$A\E/$B/g; # THIS IS NOT RIGHT
      Oh, thank you!! I didn't know that. I corrected the problem, and it seems to work perfectly now!
Re: 1000000th question about regex lol (updated)
by AnomalousMonk (Archbishop) on Dec 07, 2019 at 11:55 UTC

    It's hard to tell just what your requirements are from the posted code and of course there's no test set (please see How to ask better questions using Test::More and sample data), but here's a regex-based attempt. More degenerate and edge/corner-case test cases are needed.
    File replace_1.pl:

    Output:
    c:\@Work\Perl\monks\harangzsolt33>perl replace_1.pl # replace ALL instances of search string (left-to-right) ok 1 - ' A' -> '-@' max times (unlimited replacements) ok 2 - 'A' -> 'xx' max times (unlimited replacements) ok 3 - ' A' -> '-@' 3 times ok 4 - 'A' -> 'xx' 3 times (unlimited replacements) ok 5 - ' A' -> '-@' 4 times ok 6 - 'A' -> 'xx' 4 times (unlimited replacements) ok 7 - ' A' -> '-@' 99 times ok 8 - 'A' -> 'xx' 99 times (unlimited replacements) # replace ALL instances of search string (right-to-left) ok 9 - ' A' -> '-@' -3 times ok 10 - 'A' -> 'xx' -3 times (unlimited replacements) ok 11 - ' A' -> '-@' -4 times ok 12 - 'A' -> 'xx' -4 times (unlimited replacements) ok 13 - ' A' -> '-@' -99 times ok 14 - 'A' -> 'xx' -99 times (unlimited replacements) # replace N instances of search string (left-to-right) ok 15 - ' A' -> '-@' 0 times ok 16 - 'A' -> 'xx' 0 times ok 17 - ' A' -> '-@' 1 times ok 18 - 'A' -> 'xx' 1 times ok 19 - ' A' -> '-@' 2 times ok 20 - 'A' -> 'xx' 2 times # replace N instances of search string (right-to-left) ok 21 - ' A' -> '-@' -1 times (1 right-most replacement) ok 22 - 'A' -> 'xx' -1 times (1 right-most replacement) ok 23 - ' A' -> '-@' -2 times (2 right-most replacements) ok 24 - 'A' -> 'xx' -2 times (2 right-most replacements) 1..24 ok 25 - no warnings 1..25

    Update: It finally dawned on me that left/right search could be a lot simpler. This

    my $rx_srch = $replace_left_to_right ? qr{ .*? \K $srch }xms # search for leftmost match : qr{ .* \K $srch }xms # search for rightmost match ;
    works with all the test cases posted above and a few more I've added since posting. (And  $rx_not_srch is no longer needed.)


    Give a man a fish:  <%-{-{-{-<

      Yes, I am sorry. Here is the explanation: I wanted to write a sub that does not always replace every occurrence of a substring. The caller can specify to only replace the first two matches or the last three matches or anything. But when every single match has to be replaced, I realized that I can use a regex to do that. But the regex replace doesn't get to run unless you omit the fourth argument.
        The caller can specify to only replace the first two matches or the last three matches or anything.

        By "or anything", do you mean that it should be possible to replace a substring that's somewhere in the middle, but not anchored at either end? My understanding of the code and discussion so far is that any sequence of substring replacements must be anchored to an end of the string, but "or anything" makes me wonder. If you have a counterexample, please post it as a test case.


        Give a man a fish:  <%-{-{-{-<

Re: 1000000th question about regex lol
by AnomalousMonk (Archbishop) on Dec 09, 2019 at 04:54 UTC

    Don't know if you're still interested in this, but here's a split-based version. Don't know if it's faster/slower/etc. Didn't know how you would want a zero-length (empty) search string handled, so I special-cased it. Tested with all the previously posted test cases and more. Works with pre-5.10 Perl versions.


    Give a man a fish:  <%-{-{-{-<