Re: Case-sensitive substitution with case-insensitive matches
by chipmunk (Parson) on Dec 02, 2000 at 03:23 UTC
|
Here is one way to do it:
s/(bad)/$1 & (' ' x length $1) | 'SAD'/ie;
This works because the ASCII code for space (0x20) is exactly the bit that distinguishes between ASCII upper and lower case letters.
$1 & (' ' x length $1) returns a string containing nulls and spaces. Each uppercase character in $1 produces a null in the string, and each lowercase character produces a space.
Or-ing that with 'SAD' turns on the "lowercase bit" for each letter in 'SAD' that lines up with a space in our string. The characters that line up with nulls are unchanged.
So, effectively, this copies all the lowercase bits from $1 to 'SAD'.
This is hard for me to explain, please let me know if it's not clear. | [reply] [d/l] |
|
|
chipmunk: This is very impressive. Good work! But just to throw a wrench in there, this solution is going to have problems with EBCDIC and I don't know that it would work with UniCode (really haven't looked at it) so it does limit the portability of this really sweet hack.
That being said, I'm really impressed :)
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
| [reply] |
Re: Case-sensitive substitution with case-insensitive matches
by runrig (Abbot) on Dec 02, 2000 at 03:51 UTC
|
++ for chipmunk's answer, but for unicode or strings with non-letters in them, here's a less elegant solution (actually this is not tested with unicode, so someone correct me if I'm wrong): #!/usr/local/bin/perl -l -w
use strict;
my $str = "SAdBoy";
$str =~ s/($str)/fix_case($1, 'badgirl')/eig;
print $str;
sub fix_case {
my $match_word = shift;
my $replace_word = shift;
my $i = 0;
for (split '', $match_word) {
next if $_ eq lc;
substr($replace_word, $i, 1) = uc substr($replace_word, $i, 1);
} continue { $i++ }
return $replace_word;
}
Update: This and my other answer slightly updated, I just like it better now :-) | [reply] [d/l] |
|
|
runrig's solution also works very nicely with locales. In a locale where 'áÁéÉ' are all letters, for example, ($_ = "Á") =~ s/(á)/fix_case($1, 'é')/eig; will change $_ from "Á" to "É"!
| [reply] |
Re: Case-sensitive substitution with case-insensitive matches
by chromatic (Archbishop) on Dec 02, 2000 at 03:16 UTC
|
My initial thought for the capitalization is something like this:
s/(bad)/my $str = "sad"; if (substr($1, 0, 1) eq 'B') { ucfirst $str }; $str/gie;
but that strikes me as ugly. Another approach would be building a hash with the possibilities:
my %insens = (
bad => 'sad',
Bad => 'Sad',
bAd => 'sAd',
);
s/(bad)/$insens{$1}/gi;
Build the hash programmatically if necessary. | [reply] [d/l] [select] |
Re: Case-sensitive substitution with case-insensitive matches
by runrig (Abbot) on Dec 02, 2000 at 05:00 UTC
|
Another solution, if all words can be put in a hash (escaping any special regex characters): #!/usr/local/bin/perl -l -w
use strict;
my %repl_words = (sadboy=>'badboy', sadgirl=>'badgirl');
my $re = join('|', keys %repl_words);
$re = qr/$re/i;
my $str = "this SadBoy and SadGirl are...";
$str =~ s/($re)/fix_case($1, $repl_words{lc($1)})/eg;
print $str;
sub fix_case {
my $match_word = shift;
my $replace_word = shift;
my $i = 0;
for (split '', $match_word) {
next if $_ eq lc;
substr($replace_word, $i, 1) = uc substr($replace_word, $i, 1);
} continue { $i++ }
return $replace_word;
}
| [reply] [d/l] |
|
|
runrig, I feel like an idiot :)
Below I offered what I thought was a different
perspective on this problem, then I realized I had
redone your solution. Ooops.
I would, however, suggest that
your solution could be improved by using
split and
join as follows:
#!/usr/local/bin/perl -l -w
use strict;
my $str = "SAdBoy";
$str =~ s/($str)/fix_case($1, 'badgirl')/eig;
print $str;
sub fix_case {
my ($match_word, $replace_word) = @_;
my @rep = split //, $replace_word;
my $i = 0;
for (split '', $match_word) {
$rep[$i] = $_ eq lc($_) ? lc($rep[$i++]) : uc($rep[$i++]);
}
return join '', @rep;
}
I also tossed in a lc() so that
capitalization in the $replace_word doesn't
"contaminate" the pattern in the $match_word.
I offer this because I am of the impression that using
substr that often is somewhat
expensive.
| [reply] [d/l] |
Re: Case-sensitive substitution with case-insensitive matches
by snax (Hermit) on Dec 02, 2000 at 17:27 UTC
|
I thought a subroutine like the following would
be more intuitive. Seems that pack
and unpack handle 8 bit chars
properly with the 'a' template, too.
Note that this sub takes the capitalization template
of the original word to be absolute, ignoring the
case of the replacement. Drop the lc()
call in the second map to have it respect
capitalization in the replacement.
#!/usr/local/bin/perl -w
use strict;
use warnings;
sub capsub ($$) {
my ($old, $new) = @_;
(my $len = length($new)) == length($old) or die "Won't work.\n";
# Find the UC chars
my @uc = map {($_ eq uc($_))? 1:0} unpack('a' x $len, $old);
# Do the swap ignoring case
$old =~ s/$old/$new/i;
# Redo the capitalization
my $j = 0;
@uc = map {$uc[$j++] ? uc($_) : lc($_)} unpack('a' x $len, $old);
# Put it back together
return pack('a' x $len, @uc);
}
my $x = capsub('BaD', 'sad');
my $y = capsub('dOOd','LeeT');
print $x, $/, $y, $/;
__END__
SaD
lEEt
Update:
Why on earth am I using pack and
unpack? Must be something in the
water.
Replace those calls with split and
join calls and it's a little more
sensible, and I can drop the $len variable.
Finally, the bit that says, "Do the swap ignoring
case" is just useless. That bit always
returns $new, so I should just use $new
in the next unpack (or split).
The fact is, this simply copies the capitalization pattern
from the first argument to the second. This can then be
used in a regex with the e modifier to obtain
the desired results.
| [reply] [d/l] |
Re: Case-sensitive substitution with case-insensitive matches
by turnstep (Parson) on Dec 02, 2000 at 05:12 UTC
|
my $foo = "AbC";
my $bar = "Ducks";
$string = "ABCDEFGHIJabc";
$string =~ s/($foo)/
for (my $i=0, my $j=length $1, my $k=length $bar;
$A=substr($foo,$i,1) and $B=substr($bar,$i,1);
$i++) {
if (
$A ge 'A' and $A le 'Z') {
substr($bar,$i,1) = uc($B);
}
}$bar
/giex;
| [reply] [d/l] |
Thanks!
by Anonymous Monk on Dec 02, 2000 at 04:20 UTC
|
chipmonk, your solution gave me chills.... In my perfect universe, this would be the kind of stuff that would impress chicks. Bravo.
I was basically doing what runrig suggested, but runrig's code is smaller and more idiomatic -- good for me to see, especially since I will have to pass on chipmonk's solution because of the limitations runrig mentioned.
Thanks to all! | [reply] |
Re: Case-sensitive substitution with case-insensitive matches
by I0 (Priest) on Dec 05, 2000 at 10:19 UTC
|
#if substituting non-alphabetics:
s/("SAd"Boy)/
my($s,$b)=($1,"'bad'girl");
substr($s,-1) x= length$b;
(lc$s^$s)&(lc$b^uc$b)^lc$b
/ieg
| [reply] [d/l] |
Re: Case-sensitive substitution with case-insensitive matches
by Anonymous Monk on Dec 05, 2000 at 09:29 UTC
|
s/(bad)/"sad"^$1^lc $1/egi
| [reply] |
Re: Case-sensitive substitution with case-insensitive matches
by Anonymous Monk on Dec 05, 2000 at 09:58 UTC
|
s/(word)/anotherword^substr($1^lc$1,0,1)/egi | [reply] |