Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

When doing a case-insensitive match, what is the easiest way to make the capitalization of the replacement string reflect the capitalization of the matched string?

example:
s/bad/sad/i
such that Bad -> Sad
and BAD -> SAD

That looks awfully hard. As a special case that I do often, how about s/word/anotherword/ where the first letter of anotherword is capitalized iff word is capitalized?

To be more specific, the question is this: I am doing this the long way, just as I would do it in C. Is there a way that takes advantage of some of Perl's unique features?

  • Comment on Case-sensitive substitution with case-insensitive matches

Replies are listed 'Best First'.
Re: Case-sensitive substitution with case-insensitive matches
by chipmunk (Parson) on Dec 02, 2000 at 03:23 UTC
    Here is one way to do it: s/(bad)/$1 & (' ' x length $1) | 'SAD'/ie; This works because the ASCII code for space (0x20) is exactly the bit that distinguishes between ASCII upper and lower case letters.

    $1 & (' ' x length $1) returns a string containing nulls and spaces. Each uppercase character in $1 produces a null in the string, and each lowercase character produces a space.

    Or-ing that with 'SAD' turns on the "lowercase bit" for each letter in 'SAD' that lines up with a space in our string. The characters that line up with nulls are unchanged.

    So, effectively, this copies all the lowercase bits from $1 to 'SAD'.

    This is hard for me to explain, please let me know if it's not clear.

      chipmunk: This is very impressive. Good work! But just to throw a wrench in there, this solution is going to have problems with EBCDIC and I don't know that it would work with UniCode (really haven't looked at it) so it does limit the portability of this really sweet hack.

      That being said, I'm really impressed :)

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Case-sensitive substitution with case-insensitive matches
by runrig (Abbot) on Dec 02, 2000 at 03:51 UTC
    ++ for chipmunk's answer, but for unicode or strings with non-letters in them, here's a less elegant solution (actually this is not tested with unicode, so someone correct me if I'm wrong):
    #!/usr/local/bin/perl -l -w use strict; my $str = "SAdBoy"; $str =~ s/($str)/fix_case($1, 'badgirl')/eig; print $str; sub fix_case { my $match_word = shift; my $replace_word = shift; my $i = 0; for (split '', $match_word) { next if $_ eq lc; substr($replace_word, $i, 1) = uc substr($replace_word, $i, 1); } continue { $i++ } return $replace_word; }
    Update: This and my other answer slightly updated, I just like it better now :-)
      runrig's solution also works very nicely with locales. In a locale where 'áÁéÉ' are all letters, for example, ($_ = "Á") =~ s/(á)/fix_case($1, 'é')/eig; will change $_ from "Á" to "É"!
Re: Case-sensitive substitution with case-insensitive matches
by chromatic (Archbishop) on Dec 02, 2000 at 03:16 UTC
    My initial thought for the capitalization is something like this:

    s/(bad)/my $str = "sad"; if (substr($1, 0, 1) eq 'B') { ucfirst $str }; $str/gie;

    but that strikes me as ugly. Another approach would be building a hash with the possibilities:

    my %insens = ( bad => 'sad', Bad => 'Sad', bAd => 'sAd', ); s/(bad)/$insens{$1}/gi;
    Build the hash programmatically if necessary.
Re: Case-sensitive substitution with case-insensitive matches
by runrig (Abbot) on Dec 02, 2000 at 05:00 UTC
    Another solution, if all words can be put in a hash (escaping any special regex characters):
    #!/usr/local/bin/perl -l -w use strict; my %repl_words = (sadboy=>'badboy', sadgirl=>'badgirl'); my $re = join('|', keys %repl_words); $re = qr/$re/i; my $str = "this SadBoy and SadGirl are..."; $str =~ s/($re)/fix_case($1, $repl_words{lc($1)})/eg; print $str; sub fix_case { my $match_word = shift; my $replace_word = shift; my $i = 0; for (split '', $match_word) { next if $_ eq lc; substr($replace_word, $i, 1) = uc substr($replace_word, $i, 1); } continue { $i++ } return $replace_word; }
      runrig, I feel like an idiot :)

      Below I offered what I thought was a different perspective on this problem, then I realized I had redone your solution. Ooops.

      I would, however, suggest that your solution could be improved by using split and join as follows:

      #!/usr/local/bin/perl -l -w use strict; my $str = "SAdBoy"; $str =~ s/($str)/fix_case($1, 'badgirl')/eig; print $str; sub fix_case { my ($match_word, $replace_word) = @_; my @rep = split //, $replace_word; my $i = 0; for (split '', $match_word) { $rep[$i] = $_ eq lc($_) ? lc($rep[$i++]) : uc($rep[$i++]); } return join '', @rep; }
      I also tossed in a lc() so that capitalization in the $replace_word doesn't "contaminate" the pattern in the $match_word.

      I offer this because I am of the impression that using substr that often is somewhat expensive.

Re: Case-sensitive substitution with case-insensitive matches
by snax (Hermit) on Dec 02, 2000 at 17:27 UTC
    I thought a subroutine like the following would be more intuitive. Seems that pack and unpack handle 8 bit chars properly with the 'a' template, too.

    Note that this sub takes the capitalization template of the original word to be absolute, ignoring the case of the replacement. Drop the lc() call in the second map to have it respect capitalization in the replacement.

    #!/usr/local/bin/perl -w use strict; use warnings; sub capsub ($$) { my ($old, $new) = @_; (my $len = length($new)) == length($old) or die "Won't work.\n"; # Find the UC chars my @uc = map {($_ eq uc($_))? 1:0} unpack('a' x $len, $old); # Do the swap ignoring case $old =~ s/$old/$new/i; # Redo the capitalization my $j = 0; @uc = map {$uc[$j++] ? uc($_) : lc($_)} unpack('a' x $len, $old); # Put it back together return pack('a' x $len, @uc); } my $x = capsub('BaD', 'sad'); my $y = capsub('dOOd','LeeT'); print $x, $/, $y, $/; __END__ SaD lEEt

    Update:
    Why on earth am I using pack and unpack? Must be something in the water.

    Replace those calls with split and join calls and it's a little more sensible, and I can drop the $len variable.

    Finally, the bit that says, "Do the swap ignoring case" is just useless. That bit always returns $new, so I should just use $new in the next unpack (or split).

    The fact is, this simply copies the capitalization pattern from the first argument to the second. This can then be used in a regex with the e modifier to obtain the desired results.

Re: Case-sensitive substitution with case-insensitive matches
by turnstep (Parson) on Dec 02, 2000 at 05:12 UTC

    Here is Another Way:

    my $foo = "AbC"; my $bar = "Ducks"; $string = "ABCDEFGHIJabc"; $string =~ s/($foo)/ for (my $i=0, my $j=length $1, my $k=length $bar; $A=substr($foo,$i,1) and $B=substr($bar,$i,1); $i++) { if ( $A ge 'A' and $A le 'Z') { substr($bar,$i,1) = uc($B); } }$bar /giex;
Thanks!
by Anonymous Monk on Dec 02, 2000 at 04:20 UTC
    chipmonk, your solution gave me chills.... In my perfect universe, this would be the kind of stuff that would impress chicks. Bravo.

    I was basically doing what runrig suggested, but runrig's code is smaller and more idiomatic -- good for me to see, especially since I will have to pass on chipmonk's solution because of the limitations runrig mentioned.

    Thanks to all!

Re: Case-sensitive substitution with case-insensitive matches
by I0 (Priest) on Dec 05, 2000 at 10:19 UTC
    #if substituting non-alphabetics:
    s/("SAd"Boy)/ my($s,$b)=($1,"'bad'girl"); substr($s,-1) x= length$b; (lc$s^$s)&(lc$b^uc$b)^lc$b /ieg
Re: Case-sensitive substitution with case-insensitive matches
by Anonymous Monk on Dec 05, 2000 at 09:29 UTC
    s/(bad)/"sad"^$1^lc $1/egi
Re: Case-sensitive substitution with case-insensitive matches
by Anonymous Monk on Dec 05, 2000 at 09:58 UTC
    s/(word)/anotherword^substr($1^lc$1,0,1)/egi