Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have question about very puzzy remove duplicate chars from long char like : bbbccabcabcabcdba => bbb c cab cab cab cdba => b* c cab* cdba It look like string compression or find longest repeating string .. Any own have idea about this?? Very Thankss
  • Comment on remove repeating characters from strings

Replies are listed 'Best First'.
Re: remove repeating characters from strings
by chromatic (Archbishop) on Jul 14, 2002 at 22:56 UTC
    If you just want to remove them, look at the /s flag to the transliteration operator (tr//). If you want to replace them, use a regular expression substitution with backreferences. It may be something like: s/(\w)\1+/$1\* /g;. That's untested, but it should get you on your way.
      The problem with that approach is that from the string ccabcabcab, it will group the first cc and only the second and third cab, while the question indicated the longest match should be favourited, not the left most.

      Here's a solution that favours the longest match, finds recursive repeatitions, and uses {$n} instead of *.

      $_ = "bbbccabcabcabcdba"; my $l = length; s{((\w{$l})\2+)}{($2){@{[length($1)/length($2)]}}}g while -- $l;
      This will produce:
      (b){3}c(cab){3}cdba
      Changing $_ to bbbccaabcaabcaabcdba, the above code will yield:
      (b){3}c(c(a){2}b){3}cdba

      Abigail