http://qs1969.pair.com?node_id=81881

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This seems like it should be easy to do, but I haven't mustered the intellectual capital to get it done gracefully without a brute-force procedural approach. Can't this be done with regexps?

I need to swap the {x}th and all succeeding occurences of a given character within a string. Example: if x is 3 and the character "e" should be replaced by "1", then:
"Terence and Philip are sweet"
Should be changed to:
"Terenc1 and Philip ar1 sw11t".
Any help will be gratefully received.

Thanks, -Standfast.

  • Comment on Replacing a given character starting with the xth occurence in a string

Replies are listed 'Best First'.
Re: Replacing a given character starting with the xth occurence in a string
by quent (Beadle) on May 21, 2001 at 03:44 UTC

    One way

    my $nth = 3; my $rep = 1; my $chr = 'e'; my $i = 1; $_ = "Terence and Philip are sweet"; s/($chr)/$i++<$nth?$1:$rep/eg; print;
(bbfu) (another way) Re: Replacing a given character starting with the xth occurence in a string
by bbfu (Curate) on May 21, 2001 at 04:16 UTC

    Another way:

    my $nth = 3; my $rep = 1; my $chr = 'e'; my $str = "Terence and Philip are sweet\n"; my $pos = 0; $pos = index($str, $chr, $pos+1) for(1..$nth); substr($str, $pos) =~ s/$chr/$rep/g; print $str;

    You could, of course, move the $pos finding part into a do block within the substr expression but it's a bit unwieldy.

    Update: Well, my method's a little bit more wordy then quent's but it's ever-so-slightly faster, almost definately because it avoids doing the string-eval. I don't think it's a significant difference, though. I was just curious. :-)

    my $s = "Terence and Philip are sweet\n"; my $c = 'e'; my $r = 1; my $n = 3; sub mine { my ($str, $chr, $rep, $nth) = @_; my $pos = 0; $pos = index($str, $chr, $pos+1) for(1..$nth); substr($str, $pos) =~ s/$chr/$rep/g; return $str; } sub quents { my ($str, $chr, $rep, $nth) = @_; my $i = 1; $str =~ s/($chr)/$i++<$nth?$1:$rep/eg; return $str; } use Benchmark qw( timethese cmpthese ); cmpthese(500000, { "Mine" => sub { mine ($s, $c, $r, $n) }, "Quent's" => sub { quents ($s, $c, $r, $n) }, }); __END__ Benchmark: timing 500000 iterations of Mine, Quent's... Mine: 14 wallclock secs (14.08 usr + 0.00 sys = 14.08 CPU) @ 35 +511.36/s (n=500000) Quent's: 19 wallclock secs (18.58 usr + 0.03 sys = 18.61 CPU) @ 26 +867.28/s (n=500000) Rate Quent's Mine Quent's 26867/s -- -24% Mine 35511/s 32% --

    bbfu
    Seasons don't fear The Reaper.
    Nor do the wind, the sun, and the rain.
    We can be like they are.

      Your solution has a few subtle bugs, try it with:
      print mine ("Terence and Philip are sweet", 'c', 'Z', 3);
      This happens because $pos is set to -1 after not finding a second 'c', and since you are indexing from $pos + 1, which has become 0, index searches from the start of the string, again.

      You should also initialize $pos to -1, not zero, for similar reasons (otherwise it ignores the first character in the string).

      Getting this problem right, without boundary or fencepost erros, employing manual indexing and positioning is remarkably tricky. So much so, that I'm not even sure the following fixed-up code is error-free:

      sub crep { my ($str, $chr, $rep, $nth) = @_; my $pos = 0; while (--$nth > 0) { $pos = index $str, $chr, $pos; last if $pos < 0; $pos++; } substr ($str, $pos) =~ s/$chr/$rep/g if $pos >= 0; return $str; }
      I would really like to see if this can be improved upon, assuming the same method is used. In the meantime, here's a slight variant, which I believe to be correct:
      sub crep { my ($str, $chr, $rep, $num) = @_; my $tstr = ''; $tstr .= substr $str, 0, (1 + index $str, $chr), '' while --$num; $str =~ s/$chr/$rep/g; $tstr.$str; }
      update: I'm beginning to believe that this problem is the poster-child for unit testing! A more promising possibility... caveat emptor as always:
      sub crep { my ($str, $chr, $rep, $num) = @_; $str !~ /$chr/g and return $str while $num--; substr ($str, -1 + pos $str) =~ s/$chr/$rep/g; return $str; }
      Of course, this one dances around the index problem with a regex.

      update2: Improving on the original fix...

      sub crep { my ($str, $chr, $rep, $nth) = @_; my $pos = 0; ($pos = index $str, $chr, $pos)++ < 0 and return $str while --$nth; substr ($str, $pos) =~ s/$chr/$rep/g; return $str; }
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
      here comes another way :)
      #!/usr/local/bin/perl -w use strict; my $s = "Terence and Philip are sweet\n"; my $c = 'e'; my $r = 1; my $n = 3; sub mine { my ($str,$chr,$new,$nbr)=@_; my $reg="(.+?)$chr" x --$nbr; $_=$str; (/$reg/)&&($str=$&)&&($_=$')&&(s/$chr/$new/g); return $str.$_; } print mine($s,$c,$r,$n);
      The code for benchmarking agains BBFU's proc and Quent's
      #!/usr/local/bin/perl -w use strict; my $s = "Terence and Philip are sweet\n"; my $c = 'e'; my $r = 1; my $n = 3; sub mine { my ($str,$chr,$new,$nbr)=@_; my $reg="(.+?)$chr" x $nbr; $_=$str; (/$reg/)&&($str=$&)&&($_=$')&&(s/$chr/$new/g); return $str.$_; } sub bbfus { my ($str, $chr, $rep, $nth) = @_; my $pos = 0; $pos = index($str, $chr, $pos+1) for(1..$nth); substr($str, $pos) =~ s/$chr/$rep/g; return $str; } sub quents { my ($str, $chr, $rep, $nth) = @_; my $i = 1; $str =~ s/($chr)/$i++<$nth?$1:$rep/eg; return $str; } use Benchmark qw( timethese cmpthese ); cmpthese(500000, { "BBFU'S" => sub { bbfus ($s, $c, $r, $n) }, "Quent's" => sub { quents ($s, $c, $r, $n) }, "MINE" => sub { mine ($s,$c,$r,$n) }, });
      AND THE BENCHMARKS :)
      Benchmark: timing 500000 iterations of BBFU'S, MINE, Quent's... BBFU'S: 20 wallclock secs (19.45 usr + 0.00 sys = 19.45 CPU) @ 25 +706.94/s (n=500000) MINE: 19 wallclock secs (19.87 usr + 0.02 sys = 19.88 CPU) @ 25 +146.69/s (n=500000) Quent's: 25 wallclock secs (24.97 usr + 0.00 sys = 24.97 CPU) @ 20 +026.70/s (n=500000) Rate Quent's MINE BBFU'S Quent's 20027/s -- -20% -22% MINE 25147/s 26% -- -2% BBFU'S 25707/s 28% 2% --
Re: Replacing a given character starting with the xth occurence in a string
by sachmet (Scribe) on May 21, 2001 at 04:28 UTC
    Method without using temporary variables:
    my $x = 3; my $chr = 'e'; my $newchr = '1'; my $str = "Terence and Philip are sweet!\n"; print $str; $x--; 1 while ($str =~ s/(([^$chr]*$chr){$x}([^$chr]*))$chr/$1$newchr/g); print $str;
Re: Replacing a given character starting with the xth occurence in a string
by ZZamboni (Curate) on May 21, 2001 at 04:16 UTC
    Another way of doing it:
    $n=2; # x-1 $chr="e"; $rep="1"; $a="Terence and Philip are sweet"; while ($a=~s/^((?:.*?$chr){$n})(.*?)$chr/$1$2$rep/) {}

    --ZZamboni

    Update: At ar0n's suggestion, the last line can also be written as:

    1 while ($a=~s/^((?:.*?$chr){$n})(.*?)$chr/$1$2$rep/);
    which results in almost exactly what sachmet wrote. I like sachmet's answer better too because it uses only one capturing set of parenthesis.
Re: Replacing a given character starting with the xth occurence in a string
by tachyon (Chancellor) on May 21, 2001 at 04:33 UTC

    Here is a different way using progressive matching and the \G assertion to continue matching from that point. TIMTOWTDI

    tachyon

    my $nth = 3; my $rep = 1; my $chr = 'e'; my $str = "Terence and Philip are sweet"; $str =~ m/$chr/gc for (1..$nth); # eat up first $nth "$chr"s $str =~ s/\G(.*?)$chr/$1$rep/g; # replace the rest print $str;
Re: Replacing a given character starting with the xth occurence in a string
by japhy (Canon) on May 21, 2001 at 05:10 UTC
    I would use tr///.
    sub adv_tr { my ($str, $x, $from, $to) = @_; my $pos = 0; # find index of Xth "$from" ($pos = index($str, $from, $pos)) == -1 and return while $x--; eval "substr(\$str, $pos) =~ tr/\Q$from\E/\Q$to\E/"; }


    japhy -- Perl and Regex Hacker
      Eh, you didn't actually test this, did you? =)

      See my reply to bbfu above. You'd have to do something like:

      sub adv_tr { my ($str, $x, $from, $to) = @_; my $pos = -1; ($pos = index($str, $from, $pos + 1)) == -1 and return $str while $x +--; eval "substr(\$str, $pos) =~ tr/\Q$from\E/\Q$to\E/"; $str; }
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
(boo) Re: Replacing a given character starting with the xth occurence in a string
by boo_radley (Parson) on May 21, 2001 at 19:24 UTC
    I love all of the answers to this question, especially ZZamboni's, satchmet's and tachyon's.
    I went over and said to @coworker, "lookie this!", and they replied "what the hell does that do?"
    I explained what the topic was.
    $coworker[1] says "ah, that's faboo, but how does it do it?" and it took me about 5 minutes to puzzle through ZZamboni's. He explained that the idea's useful, but the implementation was a bit opaque for him. We chatted for a few more, and then I suggested the following, which is a much different take.
    use strict; my $s=n_sub ("Toy boats are for the little boys" ,2,"b","g"); $s=n_sub ($s,4,"o","i"); $s=n_sub ($s,2,"y","rl"); print $s; sub n_sub { my ($os, $xth, $ic, $oc) = @_; my @el= split /$ic/,$os, $xth; $el[-1] =~ s/$ic/$oc/g; return join ($ic, @el); }
    I don't suggest that this may be a faster implementation, or a better one, but for those that I showed, it's more understandable.
    I'm not knocking anyone's reg-fu, of course, but even some basic concepts like ?: confound some, and I wanted to show TIMTOWTDI.


    on a related subject, isn't this concept called lookbehind, as in the owl, pg 229 and 230?
Re: Replacing a given character starting with the xth occurence in a string
by sharle (Acolyte) on May 22, 2001 at 03:30 UTC
    Well, I guess I deserved to lose an experience point for my hastily posted "solution" yesterday. In an effort to redeem myself, here is yet another solution to this problem:

    #!/usr/bin/perl -w reg.pl my ($p, @q, $matchchar, $nummatch, $rep, $count, $q, $out); $p = 'Terrence and Phillip are sweet'; $count = 0; $matchchar = "e"; $nummatch = 3; $repchar = "1"; @q = split (/(.*?)/, $p); for($i = 0; $i < $#q + 1; $i++) { $_ = $q[$i]; $count += 1 if (/$matchchar/); if (/$matchchar/ && ($count > $nummatch)) { $q[$i] = $repchar; } } $out = join ("", @q[0 .. $#q]); print "out === $out";

    This one works, and solves the correct problem.

    sharle

      I can see that you put some effort in this program, so I have ++'d your post. Now I'll offer some comments as constructive criticism, and a rewriting of your program to show some more Perlish ways of doing things:
      • The following line:
        @q = split(/(.*?)/, $p);
        Is doing a lot more work than it should. You are actually splitting on empty strings (that's what .*? will always evaluate to) and storing the delimiters (the empty strings), so the string "foo" gets split as ("f", "", "o", "", "o"). If you want to split in individual characters, it's better to do:
        @q = split(//,$p);
        which splits on an empty string, but without the regex, and does not store the delimiters, which you do not need anyway, and does not affect the subsequent code.
      • A matter of style and possibly efficiency: I would much prefer using $i <= $#q as the termination condition in your for. But the really Perlish way of doing it would be:
        foreach (@q) {
        which also automatically assigns $_ for you on each iteration.
      • You don't really need to use regular expressions to do the matching, you could use eq instead, both for clarity and possibly for efficiency.
      • $count++ could be used instead of $count+=1.
      • When you are using $_ inside of a foreach, it becomes an implicit reference to each element of the array, so in this case it has an associated side benefit: you can assign to $_ to modify the array, instead of assigning to $q[$i].
      • The logic could be rearranged to only check against $matchchar once.
      • As per the original specification of the problem, you are to replace starting with the Nth occurrence, so the check should be $count >= $nummatch.
      • You don't need to use @q[0 .. $#q]! Saying @q by itself represents the whole array.
      So here's my first rewrite of the main section of your program:
      @q = split (//, $p); foreach (@q) { $count++ if $_ eq $matchchar; if ($_ eq $matchchar && ($count >= $nummatch)) { $_ = $repchar; } } $out = join ("", @q); print "out === $out";
      Restructuring the insides of the loop, we can get:
      foreach (@q) { if ($_ eq $matchchar) { if (++$count >= $nummatch) { $_ = $repchar; } } }
      Now compressing the two if's, we get:
      foreach (@q) { if ($_ eq $matchchar && ++$count >= $nummatch) { $_ = $repchar; } }
      Now, notice that we are assigning one value to $_ when a certain condition is satisfied, and another (actually leaving its old value) when it's not. So we could use the conditional operator to eliminate the if altogether:
      foreach (@q) { $_ = ($_ eq $matchchar && ++$count >= $nummatch)?$repchar:$_; }
      And now, notice that we are using the foreach to compute a value based on each element of @q. Ideal use of map!
      @q = map { ($_ eq $matchchar && ++$count >= $nummatch)?$repchar:$_ } @q;
      And now we don't need to initially asign the result of the split to @q, because all we are doing with it is passing it as argument to map, so we can do:
      @q = map { ($_ eq $matchchar && ++$count >= $nummatch)?$repchar:$_ } split(//, $p);
      And finally, we can eliminate @q altogether because we can pass the result of the map directly to the join:
      $out = join ("", map { ($_ eq $matchchar && ++$count >= $nummatch)?$repchar:$_ } split (//, $p));
      Proof that any program can be transformed to a one-liner in Perl :-)

      Man that was fun :-)

      --ZZamboni

        This was great! Thanks!

        I spent a huge amount of time trying to figure out how to split on characters, I tried everything except what you used. (I was at work, and so only had the man pages to work from).

        All of your tips were greatly appreciated. I didn't know that foreach automatically assigned $_. Useful knowledge, that.

        I like the result much better than my original, and it was very cool to see how to use map.

        Still Learning, sharle

Re: Replacing a given character starting with the xth occurence in a string
by Zaxo (Archbishop) on May 22, 2001 at 07:21 UTC

    Here's another way that avoids a lot of fuss. It's rather like quent's but avoids
    unneeded churning on the counter.

    It also works with multicharacter substitutions and, with some odd results unless
    the start count < 1, the target can be a regex.

    #!/usr/bin/perl -w # -*-Perl-*- use strict; sub zaxo { my ($str,$chr,$new,$nbr)=@_; $str=~s/$chr/(--$nbr>0)?$chr:$new/geo; $str; } my $s = "Terence and Philip are sweet\n"; my $c = 'e'; my $r = 1; my $n = 3; print zaxo( $s,$c,$r,$n); exit;
    Against the bbfu benchmark posted above (corrected &mine output) I get:
    Benchmark: timing 500000 iterations of BBFU'S, MINE, Quent's, Zaxo's.. +.<BR/> BBFU'S: 12 wallclock secs (12.18 usr + 0.05 sys = 12.23 CPU) MINE: 13 wallclock secs (12.32 usr + 0.06 sys = 12.38 CPU) Quent's: 13 wallclock secs (13.40 usr + 0.03 sys = 13.43 CPU) Zaxo's: 10 wallclock secs (10.59 usr + 0.00 sys = 10.59 CPU)

    I must to agree with MeowChow that this is a litmussy kind of question.

    After Compline,
    Zaxo

Re: Replacing a given character starting with the xth occurence in a string
by Dr. Mu (Hermit) on May 22, 2001 at 10:53 UTC
    The following works without using any explicit iterations. I haven't clocked its execution against the others, though.
    $s = 'abc nbc cbs fox hbo sho cnn rox'; ($orig, $num, $subst) = ('n', 2, '-'); $s =~ s/^(([^$orig]*$orig){$num})(.*)/$1/; ($t = $3) =~ s/$orig/$subst/g; $s .= $t; print $s
    This prints:
    abc nbc cbs fox hbo sho cn- rox
    It works by cutting the string into two parts, applying the substitution to the second part (if there is one) and rejoining them.
Re: Replacing a given character starting with the xth occurence in a string
by sharle (Acolyte) on May 21, 2001 at 10:09 UTC
    Um, I'm pretty new to perl, but your problem is very simple really. I think everyone has tried to make it much more complicated a problem than it really is.

    Try this:

    #!/usr/perl -w my p$ = 'Terrence and Phillip are sweet'; p$ =~ s/w/1/g;
    p$ will then contain "T1rr1nc1 and Phillip ar1 sw11t", I think you'll find.

    sharle

      You wouldn't be the first person here to misread (and therefore mis-answer) a question, so we might all want to be lenient on that account (but do go back and read the original question again, it isn't quite a simple substitution question). But everyone should take the time to test the code that they post at least minimally. Your code does not compile because the variable should be $p not p$, and making that change doesn't give the result you stated because your code replaces w's instead of e's. It looks as if you are trying to be careful by using -w and my() variables, so maybe you tested code and then retyped it for submission, making typos as you went. Copy and paste is a much safer way to participate in online forums.

        You're right of course, my code should have been:

        #!/usr/bin/perl -w my $p = 'Terrence and Philip are sweet'; $p =~ s/e/1/g; print "$p";

        which is a direct copy and paste, but still doesn't give the desired result, I see after re-reading. I'll have to think about that a little harder. Now I see why it wasn't as easy as I thought at first go.

        sharle

Re: Replacing a given character starting with the xth occurence in a string
by tachyon (Chancellor) on May 22, 2001 at 18:13 UTC
    Complete bastard shows how monk's code can be shoretened by 50+% and still do same.......
    # ------------- short summary my $p = 'Terrence and Phillip are sweet'; my ($count,$matchchar,$nummatch,$repchar) = qw(0 e 3 1); my @q = split//,$p; for(@q) { next unless /$matchchar/; $count ++; $_ = $repchar if $count > $nummatch; } print 'out === ',@q # ------------- end short summary. # First you are not using strict which would have pointed # out the fact that you declare a var $rep but use a var called # $repchar and forget to declare $i with my. # #!/usr/bin/perl -w reg.pl # my ($p, @q, $matchchar, $nummatch, $rep, $count, $q, $out); # $p = 'Terrence and Phillip are sweet'; # $count = 0; # $matchchar = "e"; # $nummatch = 3; # $repchar = "1"; # # Bulk my declaration at top kind of looses the scoping value of # my as all these vars are effectively global, although it makes # no difference in your script I like to keep my 'mys' local so # as to speak # $rep is unused, this is supposed to be $repchar but with no # strict you have not allowed perl to tell you. # I have declared and assigned vars almost all at once here # my $p = 'Terrence and Phillip are sweet'; my ($count,$matchchar,$nummatch,$repchar) = qw(0 e 3 1); # # @q = split (/(.*?)/, $p); # # Your split syntax is an odd way to split on null, how about # my @q = split//,$p; # # for($i = 0; $i < $#q + 1; $i++) { # $_ = $q[$i]; # $count += 1 if (/$matchchar/); # if (/$matchchar/ && ($count > $nummatch)) { # $q[$i] = $repchar; # } # } # # You can clean this iteration up heaps. # You could write # for my $i(0..$#q) { # which would shorten it down and cure the missing my but there # is more. All you need is a # for(@q) { # within this loop each element of the @q array is aliased to the # magical variable $_ If we modify $_ we modify that array element. # Also your logic can be improved as you test for a match to # $matchchar twice wich is unecessary. I use a next unless # construct as it makes it obvious what this loop does - if we do # not /$matchchar/ we do the next iteration, read no further! # So without further ado, let's just do: # for(@q) { next unless /$matchchar/; $count ++; $_ = $repchar if $count > $nummatch; } # # $out = join ("", @q[0 .. $#q]); # print "out === $out"; # # join'',@q is much shorter than join ("", @q[0 .. $#q]) and does # the same but but this is shorter. # print "out ===", join'',@q; # has the same effect as these two lines and skips the unecessary # asignment to $out # As print @foo is the same as print join '', @foo; # we can shorten this further to: # print 'out === ',@q; # # BTW # print "@foo"; # is the same as: # print join $", @foo; # The output record seperator $" is set to ' ' by default, but you # can set it to anything you want. So $"='';print "out === @q"; is # yet another variation. # Here is my version of your code, which is remarkably like the # code I did not post in the first place as the Camel book says # when you think you want to chop a string up into substrings # what you really eant is the \G asertion. =pod #!/usr/bin/perl use strict; my $p = 'Terrence and Phillip are sweet'; my ($count,$matchchar,$nummatch,$repchar) = qw(0 e 3 1); my @q = split//,$p; for(@q) { next unless /$matchchar/; $count ++; $_ = $repchar if $count > $nummatch; } print 'out === ',@q; =cut
Re: Replacing a given character starting with the xth occurence in a string
by zeidrik (Scribe) on May 22, 2001 at 16:58 UTC
    Here is possibly the most compact way (not the fastest yet)
    my $s = "Terence and Philip are sweet\n"; my $c = 'e'; my $r = 1; my $n = 3; ($s=~/((.*?)$c){$n}/)&&(($s,$_)=($&,$'))&&(s/$c/$r/g)&&(print $s,$_);

      Using $& and $' is a bad idea since it slows down all regexen for the life of that instance of the Perl interpreter (for the life of the script). Perhaps you know that but I don't want casual readers to get the wrong idea about those.

              - tye (but my friends call me "Tye")