global match except last one

utku has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: global match except last one by ikegami (Patriarch) on Jul 12, 2011 at 15:50 UTC
`s/([a-z]+)(?=(.?))/ uc($1) . (length($2) ? "_" : "") /ge;` [download]	[reply] [d/l]
Re^2: global match except last one by utku (Acolyte) on Jul 12, 2011 at 15:57 UTC
Thank you very much ikegami!! Respect. Long live the Perl monks, you deserve any kind compliment!	[reply]
Re: global match except last one by happy.barney (Friar) on Jul 12, 2011 at 15:55 UTC
look at "look around assertions" `$str =~ s/(?<=[a-z])(?=[A-Z])/_/g; say uc $str;` [download]	[reply] [d/l]
Re: global match except last one by jpl (Monk) on Jul 12, 2011 at 18:16 UTC
Always curious, I wondered if the apparently complex lookahead logic proposed by ikegami and happy.barney would actually save time over your original approach, ignoring, as we should not, that your original approach made some bold assumptions about where the final underscore would appear. I factored out the final uppercasing to make the code more nearly comparable, and came up with use Benchmark('countit'); $code = '$str =~ s/(?<=[a-z])(?=[A-Z])/_/g'; $t = countit(5, '$str="BuyACaseOfCamels";' . $code); $count = $t->iters ; print "$count loops of $code\n"; $code = '$str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : "") /ge' +; $t = countit(5, '$str="BuyACaseOfCamels";' . $code); $count = $t->iters ; print "$count loops of $code\n"; $code = '$str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//'; $t = countit(5, '$str="BuyACaseOfCamels";' . $code); $count = $t->iters ; print "$count loops of $code\n"; [download] I was a bit surprised at the results. perl ccase.pl 1686588 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g 1194666 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : +"") /ge 1520477 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$// perl ccase.pl 1793776 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g 1286561 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : +"") /ge 1466174 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$// perl ccase.pl 1760558 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g 1336044 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : +"") /ge 1492832 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$// [download] Over three runs, the counts varied slightly, but happy.barney's code consistently outperformed your original code, and ikegami's code was only slightly less peppy than yours, a fair tradeoff for doing a better job of trimming the final underscore. I, for one, am impressed at how well the regular expression engine can perform.	[reply] [d/l] [select]
Re: global match except last one by AnomalousMonk (Archbishop) on Jul 13, 2011 at 00:35 UTC
Here's an approach that doesn't depend on `/e` evaluation or on separate upper-casing (Update: and handles strings with mixed camelCase and non-camelCase words). Note this handles the degenerate camelCase 'aB' correctly, except I'm not sure just what is 'correct' camelCase in this case. No attempt made at benchmarking. Tested under 5.8.9 and 5.12.3. >perl -wMstrict -le "unshift @ARGV, 'not foo camelCase NOT BAR namesCanBeDifferent oK Not Baz'; ;; for (@ARGV) { print qq{'$_'}; s{ ([[:lower:]]) ((?<=[[:lower:]]) [[:upper:]][[:lower:]]) } {\U$1_$2}xmsg; print qq{'$_' \n}; } " "aB aB aB" " aB " "aBc aBc aBc" " aBc " "No No No" " No " "Not Not Not" " Not " 'not foo camelCase NOT BAR namesCanBeDifferent oK Not Baz' 'not foo CAMEL_CASE NOT BAR NAMES_CAN_BE_DIFFERENT O_K Not Baz' 'aB aB aB' 'A_B A_B A_B' ' aB ' ' A_B ' 'aBc aBc aBc' 'A_BC A_BC A_BC' ' aBc ' ' A_BC ' 'No No No' 'No No No' ' No ' ' No ' 'Not Not Not' 'Not Not Not' ' Not ' ' Not ' [download]	[reply] [d/l] [select]
Re: global match except last one by 7stud (Deacon) on Jul 13, 2011 at 10:40 UTC
Another way: `use strict; use warnings; use 5.010; my @strings = qw{ aB namesCanBeDifferentAny helloWorld }; for my $str (@strings) { $str =~ s{ ( [a-z]* ) ( [A-Z] [a-z]* ) } {\U$1_$2}xmsg; say $str; } --output:-- A_B NAMES_CAN_BE_DIFFERENT_ANY HELLO_WORLD` [download] ...but alas it is very slow, producing half as many loops as happy.barney's lookaround. Who would have thought that a zero width match could be replaced by something?	[reply] [d/l]
Re: global match except last one by dbs (Sexton) on Jul 13, 2011 at 17:29 UTC
It does not work with `# perl -le '$str='Camelcase'; $str =~ s/(?<=[a-z])(?=[A-Z])/_/g; print + $str;' Camelcase` [download]	[reply] [d/l]
Re^2: global match except last one by 7stud (Deacon) on Jul 13, 2011 at 23:02 UTC
Well, just because you write the word 'camlecase' does not mean it's in camel case format.	[reply]