Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I want to ask your help to construct a regex that would make the following substitution:

substitute all occurrences of a character (let's call it 'X') that is inbetween two others (let's say 'A'), with 'A'.

So, this string:

AAAAAXXXXXXXAAAXXXXXAXXXXAAAA

becomes:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA

I don't know how long the stretch of 'X's is.

My approach is to split the string multiple times and replace 'X' with 'A', but I am sure there must be a way to do it very easily.

Replies are listed 'Best First'.
Re: Replace all characters inbetween
by Tux (Canon) on Jun 05, 2022 at 08:28 UTC

    TIMTOWTDI (and likely a lot more than just these three)

    use 5.018003; use warnings; use DP; use Test::More tests => 3; use Benchmark "cmpthese"; my $n = 100; my $s = "AAAAAXXXXXXXAAAXXXXXAXXXXAAAAA" x 100; my $A = "A" x 10000; sub s_ger { $s =~ s{A\K([^A]+)(?=A)} {"A" x length + $1 }ger } sub substr_ger { $s =~ s{A\K([^A]{1,10000})(?=A)}{substr $A, 0, length + $1 }ger } sub split_join { my @s = split m/A\K([^A]+)(?=A)/ => $s; for ($_ = 1; $_ < $#s; $_ += 2) { $s[$_] = "A" x length $s[$_]; } join "" => @s; } is (s_ger (), "A" x 3000, "s//x/ger"); is (substr_ger (), "A" x 3000, "s//substr/ger"); is (split_join (), "A" x 3000, "split/join"); cmpthese (-2, { s_ger => \&s_ger, subst => \&substr_ger, spljn => \&split_join, });

    ->

    $ perl test.pl 1..3 ok 1 - s//x/ger ok 2 - s//substr/ger ok 3 - split/join Rate spljn s_ger subst spljn 3330/s -- -2% -6% s_ger 3405/s 2% -- -4% subst 3550/s 7% 4% --

    YMMV


    Enjoy, Have FUN! H.Merijn
      Hi Tux!
      Thank you so much for the help!! If I may, can you take out one of the solutions, because I have hard time isolating one?
      I tried this:
      my $s = "AAAAAXXXXXXXAAAXXXXXAXXXXAAAAA"; $s =~s{A\K([^A]+)(?=A)} {"A" x length $1 }; print $s."\n";
      but it does not print something proper.

        Just a general note for future reference: As a rule, "it doesn't work" is a very poor problem description. The code you posted is short enough that you could have listed its output and any warning/error messages, etc., as well. Please see Short, Self-Contained, Correct Example.


        Give a man a fish:  <%-{-{-{-<

Re: Replace all characters inbetween
by AnomalousMonk (Archbishop) on Jun 05, 2022 at 09:51 UTC

    I think an s///e approach like Tux's s_ger() here is probably best for readability/maintainability, but just for grins, here's a "pure" regex approach:

    Win8 Strawberry 5.8.9.5 (32) Sun 06/05/2022 5:38:06 C:\@Work\Perl\monks >perl use strict; use warnings; my $s = 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA'; print "'$s' \n"; $s =~ s{ (?: \G (?! \A) | (?<= A)) X (?= X* A) }{a}xmsg; print "'$s' \n"; ^Z 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA' 'AAAAAaaaaaaaAAAaaaaaAaaaaAAAA'
    Please see perlre, perlretut, perlreref, and perlrequick.

    Update: And a shameless elaboration:

    Win8 Strawberry 5.8.9.5 (32) Sun 06/05/2022 6:52:29 C:\@Work\Perl\monks >perl use strict; use warnings; my %replace = map { 'X' x $_ => 'a' x $_ } 1, 3, 7; my ($rx_search) = map qr{ $_ }xms, join ' | ', map quotemeta, reverse sort keys %replace ; # print "\$rx_search $rx_search \n"; # for debug for my $s ( 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA', 'XXXXXXXAAAXXXXXAXXXX', 'AXA', 'XXXXXXXXXXX', 'AAAAAAAAAAA', 'XA', 'AX', 'X', 'A', '', ) { print "'$s' \n"; (my $t = $s) =~ s{ (?: \G (?! \A) | (?<= A)) ($rx_search) (?= X* A +) } {$replace{$1}}xmsg; print "'$t' \n\n"; } ^Z 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA' 'AAAAAaaaaaaaAAAaaaaaAaaaaAAAA' 'XXXXXXXAAAXXXXXAXXXX' 'XXXXXXXAAAaaaaaAXXXX' 'AXA' 'AaA' 'XXXXXXXXXXX' 'XXXXXXXXXXX' 'AAAAAAAAAAA' 'AAAAAAAAAAA' 'XA' 'XA' 'AX' 'AX' 'X' 'X' 'A' 'A' '' ''


    Give a man a fish:  <%-{-{-{-<

Re: Replace all characters inbetween
by jwkrahn (Abbot) on Jun 05, 2022 at 12:35 UTC
    $ perl -le' my $x = "AAAAAXXXXXXXAAAXXXXXAXXXXAAAA"; print $x; $x =~ s/(?<=A)X+(?=A)/ "A" x ( $+[0] - $-[0] ) /eg; print $x; ' AAAAAXXXXXXXAAAXXXXXAXXXXAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAA

    Or:

    $ perl -le' my $x = "AAAAAXXXXXXXAAAXXXXXAXXXXAAAA"; print $x; $x =~ s/(?<=A)X+(?=A)/@{[ "A" x ( $+[0] - $-[0] ) ]}/g; print $x; ' AAAAAXXXXXXXAAAXXXXXAXXXXAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      Thank you! How about the begin and end of the string? I realised that it does not change these, if present.

        Do you mean something like this?

        Win8 Strawberry 5.8.9.5 (32) Sun 06/05/2022 10:21:09 C:\@Work\Perl\monks >perl use strict; use warnings; for my $s ( 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA', 'AXA', 'XXXXXXXAAAXXXXXAXXXX', 'XXXXX', 'XA', 'AX', 'X', 'AAAAA', 'YXY', 'A', '', ) { (my $t = $s) =~ s{ (?<! [^A]) X+ (?! [^A]) } {@{[ 'a' x ($+[0] - $-[0]) ]}}xmsg; print "'$s' \n'$t' \n\n"; } ^Z 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA' 'AAAAAaaaaaaaAAAaaaaaAaaaaAAAA' 'AXA' 'AaA' 'XXXXXXXAAAXXXXXAXXXX' 'aaaaaaaAAAaaaaaAaaaa' 'XXXXX' 'aaaaa' 'XA' 'aA' 'AX' 'Aa' 'X' 'a' 'AAAAA' 'AAAAA' 'YXY' 'YXY' 'A' 'A' '' ''


        Give a man a fish:  <%-{-{-{-<

        What is supposed to happen at the begin and end of the string? Please at least give some examples.


        Give a man a fish:  <%-{-{-{-<

        How about the begin and end of the string? I realised that it does not change these, if present.

        the problem statement was substitute all occurences of a character (let's call it 'X') that is inbetween two others

Re: Replace all characters inbetween
by LanX (Saint) on Jun 05, 2022 at 19:30 UTC
    > So, this string:

    > AAAAAXXXXXXXAAAXXXXXAXXXXAAAA

    > becomes:

    > AAAAAAAAAAAAAAAAAAAAAAAAAAAAA

    replacing "X" -> "a" for better demo

    use v5.12.0; use warnings; my $in = 'AAAAAXXXXXXXAAAXXXXXAXXXXAAAA'; my $out = $in =~ s [ (?<=A) X+ (?=A) ] [ $& =~ tr/X/a/r ]gexr; say $in; say $out;

    AAAAAXXXXXXXAAAXXXXXAXXXXAAAA AAAAAaaaaaaaAAAaaaaaAaaaaAAAA

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    simplified $1 to $&

Re: Replace all characters inbetween
by GrandFather (Saint) on Jun 06, 2022 at 21:55 UTC

    Using a simple substitution with look behind and look ahead seems to fit the bill:

    use strict; use warnings; my $str = 'XAXAAAAXXXXXXXAAXXXXXAXXXXAAAAX'; my $before = $str; $str =~ s/(?<=A)(X+)(?=A)/'a' x length $1/ge; print "$before\n"; print "$str\n";

    Prints:

    XAXAAAAXXXXXXXAAXXXXXAXXXXAAAAX XAaAAAAaaaaaaaAAaaaaaAaaaaAAAAX

    with lower case 'a' used to make the substitutions clear.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond