s/([a-z]+)(?=(.?))/ uc($1) . (length($2) ? "_" : "") /ge;
| [reply] [d/l] |
Thank you very much ikegami!!
Respect. Long live the Perl monks, you deserve any kind compliment!
| [reply] |
look at "look around assertions"
$str =~ s/(?<=[a-z])(?=[A-Z])/_/g;
say uc $str;
| [reply] [d/l] |
Always curious, I wondered if the apparently complex lookahead logic proposed by ikegami and happy.barney would actually save time over your original approach, ignoring, as we should not, that your original approach made some bold assumptions about where the final underscore would appear.
I factored out the final uppercasing to make the code more nearly comparable, and came up with
use Benchmark('countit');
$code = '$str =~ s/(?<=[a-z])(?=[A-Z])/_/g';
$t = countit(5, '$str="BuyACaseOfCamels";' . $code);
$count = $t->iters ;
print "$count loops of $code\n";
$code = '$str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : "") /ge'
+;
$t = countit(5, '$str="BuyACaseOfCamels";' . $code);
$count = $t->iters ;
print "$count loops of $code\n";
$code = '$str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//';
$t = countit(5, '$str="BuyACaseOfCamels";' . $code);
$count = $t->iters ;
print "$count loops of $code\n";
I was a bit surprised at the results.
perl ccase.pl
1686588 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g
1194666 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" :
+"") /ge
1520477 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//
perl ccase.pl
1793776 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g
1286561 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" :
+"") /ge
1466174 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//
perl ccase.pl
1760558 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g
1336044 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" :
+"") /ge
1492832 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//
Over three runs, the counts varied slightly, but happy.barney's code consistently outperformed your original code, and ikegami's code was only slightly less peppy than yours, a fair tradeoff for doing a better job of trimming the final underscore. I, for one, am impressed at how well the regular expression engine can perform. | [reply] [d/l] [select] |
Here's an approach that doesn't depend on /e evaluation or on separate upper-casing (Update: and handles strings with mixed camelCase and non-camelCase words). Note this handles the degenerate camelCase 'aB' correctly, except I'm not sure just what is 'correct' camelCase in this case. No attempt made at benchmarking. Tested under 5.8.9 and 5.12.3.
>perl -wMstrict -le
"unshift @ARGV,
'not foo camelCase NOT BAR namesCanBeDifferent oK Not Baz';
;;
for (@ARGV) {
print qq{'$_'};
s{ ([[:lower:]]*) ((?<=[[:lower:]]) [[:upper:]][[:lower:]]*) }
{\U$1_$2}xmsg;
print qq{'$_' \n};
}
"
"aB aB aB" " aB " "aBc aBc aBc" " aBc "
"No No No" " No " "Not Not Not" " Not "
'not foo camelCase NOT BAR namesCanBeDifferent oK Not Baz'
'not foo CAMEL_CASE NOT BAR NAMES_CAN_BE_DIFFERENT O_K Not Baz'
'aB aB aB'
'A_B A_B A_B'
' aB '
' A_B '
'aBc aBc aBc'
'A_BC A_BC A_BC'
' aBc '
' A_BC '
'No No No'
'No No No'
' No '
' No '
'Not Not Not'
'Not Not Not'
' Not '
' Not '
| [reply] [d/l] [select] |
use strict;
use warnings;
use 5.010;
my @strings = qw{
aB
namesCanBeDifferentAny
helloWorld
};
for my $str (@strings) {
$str =~ s{
(
[a-z]*
)
(
[A-Z]
[a-z]*
)
}
{\U$1_$2}xmsg;
say $str;
}
--output:--
A_B
NAMES_CAN_BE_DIFFERENT_ANY
HELLO_WORLD
...but alas it is very slow, producing half as many loops as happy.barney's lookaround. Who would have thought that a zero width match could be replaced by something? | [reply] [d/l] |
# perl -le '$str='Camelcase'; $str =~ s/(?<=[a-z])(?=[A-Z])/_/g; print
+ $str;'
Camelcase
| [reply] [d/l] |
Well, just because you write the word 'camlecase' does not mean it's in camel case format.
| [reply] |