utku has asked for the wisdom of the Perl Monks concerning the following question:

I want to reduce following to substitutions into one, is that possible?

# convert medial capital into capital separated by "_" $str =~ s/([a-z]+)/sprintf(uc($1)."_")/ge; $str =~ s/_$//;
"g" matches globally, eg. if I have "camelCase" then first substitution ends with "CAMEL_CASE_". I want to save 2nd substitution. Names can be any length in runtime, ie. such as "namesCanBeDifferentAny" etc. Is there a way to stop regexp engine in the (n-1)th match, providing that there are n matches?

Replies are listed 'Best First'.
Re: global match except last one
by ikegami (Patriarch) on Jul 12, 2011 at 15:50 UTC
    s/([a-z]+)(?=(.?))/ uc($1) . (length($2) ? "_" : "") /ge;
      Thank you very much ikegami!! Respect. Long live the Perl monks, you deserve any kind compliment!
Re: global match except last one
by happy.barney (Friar) on Jul 12, 2011 at 15:55 UTC
    look at "look around assertions"
    $str =~ s/(?<=[a-z])(?=[A-Z])/_/g; say uc $str;
Re: global match except last one
by jpl (Monk) on Jul 12, 2011 at 18:16 UTC
    Always curious, I wondered if the apparently complex lookahead logic proposed by ikegami and happy.barney would actually save time over your original approach, ignoring, as we should not, that your original approach made some bold assumptions about where the final underscore would appear. I factored out the final uppercasing to make the code more nearly comparable, and came up with
    use Benchmark('countit'); $code = '$str =~ s/(?<=[a-z])(?=[A-Z])/_/g'; $t = countit(5, '$str="BuyACaseOfCamels";' . $code); $count = $t->iters ; print "$count loops of $code\n"; $code = '$str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : "") /ge' +; $t = countit(5, '$str="BuyACaseOfCamels";' . $code); $count = $t->iters ; print "$count loops of $code\n"; $code = '$str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//'; $t = countit(5, '$str="BuyACaseOfCamels";' . $code); $count = $t->iters ; print "$count loops of $code\n";
    I was a bit surprised at the results.
    perl ccase.pl 1686588 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g 1194666 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : +"") /ge 1520477 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$// perl ccase.pl 1793776 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g 1286561 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : +"") /ge 1466174 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$// perl ccase.pl 1760558 loops of $str =~ s/(?<=[a-z])(?=[A-Z])/_/g 1336044 loops of $str =~ s/([a-z]+)(?=(.?))/ $1 . (length($2) ? "_" : +"") /ge 1492832 loops of $str =~ s/([a-z]+)/$1_/g; $str =~ s/_$//
    Over three runs, the counts varied slightly, but happy.barney's code consistently outperformed your original code, and ikegami's code was only slightly less peppy than yours, a fair tradeoff for doing a better job of trimming the final underscore. I, for one, am impressed at how well the regular expression engine can perform.
Re: global match except last one
by AnomalousMonk (Archbishop) on Jul 13, 2011 at 00:35 UTC

    Here's an approach that doesn't depend on  /e evaluation or on separate upper-casing (Update: and handles strings with mixed camelCase and non-camelCase words). Note this handles the degenerate camelCase 'aB' correctly, except I'm not sure just what is 'correct' camelCase in this case. No attempt made at benchmarking. Tested under 5.8.9 and 5.12.3.

    >perl -wMstrict -le "unshift @ARGV, 'not foo camelCase NOT BAR namesCanBeDifferent oK Not Baz'; ;; for (@ARGV) { print qq{'$_'}; s{ ([[:lower:]]*) ((?<=[[:lower:]]) [[:upper:]][[:lower:]]*) } {\U$1_$2}xmsg; print qq{'$_' \n}; } " "aB aB aB" " aB " "aBc aBc aBc" " aBc " "No No No" " No " "Not Not Not" " Not " 'not foo camelCase NOT BAR namesCanBeDifferent oK Not Baz' 'not foo CAMEL_CASE NOT BAR NAMES_CAN_BE_DIFFERENT O_K Not Baz' 'aB aB aB' 'A_B A_B A_B' ' aB ' ' A_B ' 'aBc aBc aBc' 'A_BC A_BC A_BC' ' aBc ' ' A_BC ' 'No No No' 'No No No' ' No ' ' No ' 'Not Not Not' 'Not Not Not' ' Not ' ' Not '
Re: global match except last one
by 7stud (Deacon) on Jul 13, 2011 at 10:40 UTC

    Another way:

    use strict; use warnings; use 5.010; my @strings = qw{ aB namesCanBeDifferentAny helloWorld }; for my $str (@strings) { $str =~ s{ ( [a-z]* ) ( [A-Z] [a-z]* ) } {\U$1_$2}xmsg; say $str; } --output:-- A_B NAMES_CAN_BE_DIFFERENT_ANY HELLO_WORLD
    ...but alas it is very slow, producing half as many loops as happy.barney's lookaround. Who would have thought that a zero width match could be replaced by something?
Re: global match except last one
by dbs (Sexton) on Jul 13, 2011 at 17:29 UTC
    It does not work with
    # perl -le '$str='Camelcase'; $str =~ s/(?<=[a-z])(?=[A-Z])/_/g; print + $str;' Camelcase
      Well, just because you write the word 'camlecase' does not mean it's in camel case format.