monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a string with brackets inside it. Now, when a string has more than 1 bracket pairs I would like to: 1) Take away two consecutive closing and opening bracket, 2) Replacing the characters in between with "N" -- as many as the characters to be repaced. Here is the example
my $input_str1 = "TG[CCC]CC[TTT]"; # Desired result is this: my $rep1 = "TG[CCCNNTTT]"; # Two Ns replace two Cs in between ][. # Similarly my $input_str2 = "TG[CCC][TTT]"; my $rep2 = "TG[CCCTTT]";
But when there are only 1 bracket pairs, I want to leave it intact:
my $input_str3 = "TG[CCAAATTT]"; # Desired result is this: my $rep3 = "TG[CCAAATTT]";
I have this, but it doesn't work.
$str =~ s/\][ATCG]+\[/N/;
Is there a single regex stroke that can handle above situations?

Regards,
Edward

Replies are listed 'Best First'.
Re: Quantified Regex Replacement
by japhy (Canon) on Feb 23, 2006 at 05:42 UTC
    I'd suggest s/\]([ATCG]+)\[/"N" x length($1)/e, which replaces the ]...[ part with one "N" for each letter in between the closing and opening bracket.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

      Hi, Try this,

      while(<DATA>){ s/\]([ATCG]*)\[/'N' x length($1)/e; print $_; } __DATA__ TG[CCC]CC[TTT] TG[CCC][TTT] TG[CCAAATTT] Respective Output is: TG[CCCNNTTT] TG[CCCTTT] TG[CCAAATTT]

      Updated: monkfan, Try below code for your second question.

      while(<DATA>){ s/\]([ATCG]*)\[/my $a = $1; $a =~ s![^C]+!'N' x length($1)!e; $a/e; print $_; } __DATA__ TG[CCC]CC[TTT] Respective Output is: TG[CCCCCTTT]

      Regards,
      Velusamy R.


      eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

      Thanks so much for your reply.
      I am just wondering how can I just take away the two consecutive reversed brackets -- ][. Like this:
      my $str = "TGC[CCC]CC[TTT]"; # Into my $rep = "TG[CCCCCTTT]";
      Still my regex below doesn't do it right, it also removes two Cs in between:
      $str =~ s/\]([ATCG]+)\[//;

      Regards,
      Edward
        That's because your substitution matches the ]...[ part and replaces it with nothing! You'll have to replace it with $1, which in your regex, contains the letters in between the closing and opening brackets. s/\]([ACTG]+)\[/$1/

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart