maithree has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I trying to do some replacing stuff like:

Search for repeated occurrences of a character at the end of a string and replace it with half the number of characters. It is assumed that it is always a multiple of 2.

We can match on n number of occurences with something like

s/(pattern+$){n}/g;
Is there something similar for replacing, something like...
s/(pattern+$){n}/(pattern){n/2}/g;
The value of n is not fixed and so I cannot harcode it in the command.

Is there any simple search/replace command that I can use to achieve this ?

Thanks.

update (broquaint): added formatting

Replies are listed 'Best First'.
Re: Replacing with multiple occurrences.
by broquaint (Abbot) on May 08, 2003 at 10:52 UTC
    Is there something similar for replacing, something like...
    Sure is, just use a bit of back-referencing and an evaluated replacement and you're sitting pretty
    my @strs = qw/fooxx baryyyy bazzzzzzz/; s{ ( (.) \2+ ) \z }(substr($1, 0, length($1) / 2))ex, print for @strs; __output__ foox baryy bazaaa
    See. perlre for more info.

    In case you're working on large strings the above regex would be quite hefty in terms of processing, so here's a version that would perform much better on longer strings

    reverse =~ /\A((.)\2+)/ and substr($_, -(length($1) / 2)) = '' for @strs; print map "$_\n", @strs; __output__ foox baryy bazaaa
    This is less processor intensive because it only needs to backtrack once, whereas the first regex has to backtrack all the way through the string.
    HTH

    _________
    broquaint

    update: added second code example

Re: Replacing with multiple occurrences.
by strat (Canon) on May 08, 2003 at 11:00 UTC
    I'm not sure if I understand you correctly, but maybe ...

    In a regular expression, you can use variables as well: $variable =~ s/($pattern)+/..../ Beware that $pattern is interpolated, so if $pattern contains something like \, it may even become an error, e.g

    C:\>perl $pattern = "\\"; $variable = "abcde\\fg"; print $variable =~ /$pattern/; ^D Trailing \ in regex m/\/ at - line 4.
    To prevent this danger of interpolation, use  /(\Q$pattern\E)+/ If you want to capture the multiple expression and not just a single appearance, put braces around and use the inner in a clustering way with ?: e.g $variable =~ s/((?:\Q$pattern\E)+)/..../ or in a better documented way:
    $variable =~ s/ ( # start capturing to $1 (?: # not capturing, just clustering \Q$pattern\E # $pattern in a quoted way ) + # one or more times ) # end capturing to $1 / .... /x;
    If you want to put code at the right hand of the substitution, you could use /e to evaluate code at the right hand sinde, e.g.
    $variable =~ s/((?:\Q$pattern\E)+)/$1.$1.$1/e; print $variable;
    The best help may be reading perldoc perlre

    Best regards,
    perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Re: Replacing with multiple occurrences.
by Skeeve (Parson) on May 08, 2003 at 11:23 UTC

      This wasn't tested, I assume, as it doesn't work as requested. Try it with my string below.

      Update: To clarify: 'aaabcdabcddddddefgg' would become 'aaabcdabcddddddefg' using your version.

      Update 2: Oops. I misread the question as 'all' duplicates in the string. Thanks to physi for pointing out my fault. Interesting exercise, anyway.

Re: Replacing with multiple occurrences.
by perlguy (Deacon) on May 08, 2003 at 15:55 UTC

    Update: as stated before, I misread the question. There is no problem with Skeeve's solution, as it does what is stated.

    Because I couldn't get Skeeve's to work on my strings, I did my own version, which should work:

    my $string = 'aaabcdabcddddddefgg'; $string =~ s/((.)(?(?=\2{2,})\2+))\1/$1/g; print "$string\n";

    I used the 'if 2 or more of the previous string' swallow them (prior to the backtrack, but anyhow), as it wouldn't work with duplicates when two or three were back to back, and * wasn't the solution, either.

    Also to note, this 'rounds up', so if there are three back to back, two will remain (3 / 2 = 1.5, rounded up to 2).

      As I understand the question, it was only for the pattern at the end of the string. So Skeeve's working very well.
      If you want to half every pattern in the string, just do:
      s/((.)\2*)\1/$1/g;
      That gives the same output like your does.

      -----------------------------------
      --the good, the bad and the physi--
      -----------------------------------
      
      Quite complicated, isn't it? As physi already stated, a simple:
      s/((.)\2*)\1/$1/g
      will do the trick of half-ing each repeated occurence and rounding them up. So "aaabbbb" will become "aabb".