Case-sensitive substitution with case-insensitive matches

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Case-sensitive substitution with case-insensitive matches by chipmunk (Parson) on Dec 02, 2000 at 03:23 UTC
Here is one way to do it: `s/(bad)/$1 & (' ' x length $1) \| 'SAD'/ie;` This works because the ASCII code for space (0x20) is exactly the bit that distinguishes between ASCII upper and lower case letters. `$1 & (' ' x length $1)` returns a string containing nulls and spaces. Each uppercase character in $1 produces a null in the string, and each lowercase character produces a space. Or-ing that with 'SAD' turns on the "lowercase bit" for each letter in 'SAD' that lines up with a space in our string. The characters that line up with nulls are unchanged. So, effectively, this copies all the lowercase bits from $1 to 'SAD'. This is hard for me to explain, please let me know if it's not clear.	[reply] [d/l]
(Ovid) Re: Re: Case-sensitive substitution with case-insensitive matches by Ovid (Cardinal) on Dec 02, 2000 at 05:05 UTC
chipmunk: This is very impressive. Good work! But just to throw a wrench in there, this solution is going to have problems with EBCDIC and I don't know that it would work with UniCode (really haven't looked at it) so it does limit the portability of this really sweet hack. That being said, I'm really impressed :) Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply]
Re: Case-sensitive substitution with case-insensitive matches by runrig (Abbot) on Dec 02, 2000 at 03:51 UTC
++ for chipmunk's answer, but for unicode or strings with non-letters in them, here's a less elegant solution (actually this is not tested with unicode, so someone correct me if I'm wrong): `#!/usr/local/bin/perl -l -w use strict; my $str = "SAdBoy"; $str =~ s/($str)/fix_case($1, 'badgirl')/eig; print $str; sub fix_case { my $match_word = shift; my $replace_word = shift; my $i = 0; for (split '', $match_word) { next if $_ eq lc; substr($replace_word, $i, 1) = uc substr($replace_word, $i, 1); } continue { $i++ } return $replace_word; }` [download] Update: This and my other answer slightly updated, I just like it better now :-)	[reply] [d/l]
Re: Re: Case-sensitive substitution with case-insensitive matches by chipmunk (Parson) on Dec 02, 2000 at 21:31 UTC
runrig's solution also works very nicely with locales. In a locale where `'áÁéÉ'` are all letters, for example, `($_ = "Á") =~ s/(á)/fix_case($1, 'é')/eig;` will change `$_` from `"Á"` to `"É"`!	[reply]
Re: Case-sensitive substitution with case-insensitive matches by chromatic (Archbishop) on Dec 02, 2000 at 03:16 UTC
My initial thought for the capitalization is something like this: `s/(bad)/my $str = "sad"; if (substr($1, 0, 1) eq 'B') { ucfirst $str }; $str/gie;` but that strikes me as ugly. Another approach would be building a hash with the possibilities: `my %insens = ( bad => 'sad', Bad => 'Sad', bAd => 'sAd', ); s/(bad)/$insens{$1}/gi;` [download] Build the hash programmatically if necessary.	[reply] [d/l] [select]
Re: Case-sensitive substitution with case-insensitive matches by runrig (Abbot) on Dec 02, 2000 at 05:00 UTC
Another solution, if all words can be put in a hash (escaping any special regex characters): `#!/usr/local/bin/perl -l -w use strict; my %repl_words = (sadboy=>'badboy', sadgirl=>'badgirl'); my $re = join('\|', keys %repl_words); $re = qr/$re/i; my $str = "this SadBoy and SadGirl are..."; $str =~ s/($re)/fix_case($1, $repl_words{lc($1)})/eg; print $str; sub fix_case { my $match_word = shift; my $replace_word = shift; my $i = 0; for (split '', $match_word) { next if $_ eq lc; substr($replace_word, $i, 1) = uc substr($replace_word, $i, 1); } continue { $i++ } return $replace_word; }` [download]	[reply] [d/l]
Re: Re: Case-sensitive substitution with case-insensitive matches by snax (Hermit) on Dec 02, 2000 at 20:56 UTC
runrig, I feel like an idiot :) Below I offered what I thought was a different perspective on this problem, then I realized I had redone your solution. Ooops. I would, however, suggest that your solution could be improved by using `split` and `join` as follows: `#!/usr/local/bin/perl -l -w use strict; my $str = "SAdBoy"; $str =~ s/($str)/fix_case($1, 'badgirl')/eig; print $str; sub fix_case { my ($match_word, $replace_word) = @_; my @rep = split //, $replace_word; my $i = 0; for (split '', $match_word) { $rep[$i] = $_ eq lc($_) ? lc($rep[$i++]) : uc($rep[$i++]); } return join '', @rep; }` [download] I also tossed in a `lc()` so that capitalization in the `$replace_word` doesn't "contaminate" the pattern in the `$match_word`. I offer this because I am of the impression that using `substr` that often is somewhat expensive.	[reply] [d/l]
Re: Case-sensitive substitution with case-insensitive matches by snax (Hermit) on Dec 02, 2000 at 17:27 UTC
I thought a subroutine like the following would be more intuitive. Seems that pack and unpack handle 8 bit chars properly with the `'a'` template, too. Note that this sub takes the capitalization template of the original word to be absolute, ignoring the case of the replacement. Drop the lc() call in the second map to have it respect capitalization in the replacement. #!/usr/local/bin/perl -w use strict; use warnings; sub capsub ($$) { my ($old, $new) = @_; (my $len = length($new)) == length($old) or die "Won't work.\n"; # Find the UC chars my @uc = map {($_ eq uc($_))? 1:0} unpack('a' x $len, $old); # Do the swap ignoring case $old =~ s/$old/$new/i; # Redo the capitalization my $j = 0; @uc = map {$uc[$j++] ? uc($_) : lc($_)} unpack('a' x $len, $old); # Put it back together return pack('a' x $len, @uc); } my $x = capsub('BaD', 'sad'); my $y = capsub('dOOd','LeeT'); print $x, $/, $y, $/; __END__ SaD lEEt [download] Update: Why on earth am I using pack and unpack? Must be something in the water. Replace those calls with split and join calls and it's a little more sensible, and I can drop the `$len` variable. Finally, the bit that says, "Do the swap ignoring case" is just useless. That bit always returns `$new`, so I should just use `$new` in the next `unpack` (or `split`). The fact is, this simply copies the capitalization pattern from the first argument to the second. This can then be used in a regex with the `e` modifier to obtain the desired results.	[reply] [d/l]
Re: Case-sensitive substitution with case-insensitive matches by turnstep (Parson) on Dec 02, 2000 at 05:12 UTC
Here is Another Way: `my $foo = "AbC"; my $bar = "Ducks"; $string = "ABCDEFGHIJabc"; $string =~ s/($foo)/ for (my $i=0, my $j=length $1, my $k=length $bar; $A=substr($foo,$i,1) and $B=substr($bar,$i,1); $i++) { if ( $A ge 'A' and $A le 'Z') { substr($bar,$i,1) = uc($B); } }$bar /giex;` [download]	[reply] [d/l]
Thanks! by Anonymous Monk on Dec 02, 2000 at 04:20 UTC
chipmonk, your solution gave me chills.... In my perfect universe, this would be the kind of stuff that would impress chicks. Bravo. I was basically doing what runrig suggested, but runrig's code is smaller and more idiomatic -- good for me to see, especially since I will have to pass on chipmonk's solution because of the limitations runrig mentioned. Thanks to all!	[reply]
Re: Case-sensitive substitution with case-insensitive matches by I0 (Priest) on Dec 05, 2000 at 10:19 UTC
#if substituting non-alphabetics: `s/("SAd"Boy)/ my($s,$b)=($1,"'bad'girl"); substr($s,-1) x= length$b; (lc$s^$s)&(lc$b^uc$b)^lc$b /ieg` [download]	[reply] [d/l]
Re: Case-sensitive substitution with case-insensitive matches by Anonymous Monk on Dec 05, 2000 at 09:29 UTC
s/(bad)/"sad"^$1^lc $1/egi	[reply]
Re: Case-sensitive substitution with case-insensitive matches by Anonymous Monk on Dec 05, 2000 at 09:58 UTC
s/(word)/anotherword^substr($1^lc$1,0,1)/egi	[reply]