grymater has asked for the wisdom of the Perl Monks concerning the following question:

Please, I need to transliterate a string in such a way that ONLY consecutive capital letters be made lowercase, e.g., this string:
AAAAAA Aaaa Aaaa AAA
Should come out like this:
aaaaaa Aaaa Aaaa aaa
Can tr///; be modified to do that? Any other solution is welcome.

Much thanks

Edit by tye, title, add CODE and P tags

Replies are listed 'Best First'.
Re: transliteration
by BUU (Prior) on Dec 05, 2003 at 17:19 UTC
    s/([A-Z][A-Z]+)/lc($1)/eg; should do the trick.. dunno how to do it with tr///, if thats possible.
      This worked like a dream. Thanks very much!
Re: transliteration
by blokhead (Monsignor) on Dec 05, 2003 at 17:19 UTC
    Can tr///; be modified to do that?
    No. tr/// has no notion of context (i.e, the adjacent characters). It works per-character only. You need s/// to do replacements in a context-sensitive manner.

    It's unclear from your examples whether ABABAB should be lowercased.. Or if the capital letters in AAaaa should be lowercased, or if the entire "word" needs to be uppercase. Depending on your requirements, a simple substitution like s/([A-Z]{2,})/lc $1/ge will work. If the uppercase letters must all be the same letter, then you'll need a more complicated substitution. Here's a starting point:

    my @foo = qw/AAAAAA Aaaa Aaaa AAA/; s/(([A-Z])\2+)/lc $1/ge for @foo; print "@foo\n"; # aaaaaa Aaaa Aaaa aaa
    You may be able to do this with lookaheads as well. In fact, I'm sure we will see many clever ways to solve this problem ;)

    blokhead

Re: transliterate only more than one uppercase character in a row
by davido (Cardinal) on Dec 06, 2003 at 07:48 UTC
    If you really mean that "AAaaa" should be lower-cased, but not "ABaaa", this will work:

    $string =~ s/(([A-Z])\2+)/lc $1/eg;

    On the other hand, if you want to also lower-case "ABaaa", this will do it:

    $string =~ s/([A-Z]{2,})/lc $1/eg;

    TIMTOWTDI. I provided these for the fun of using the {2,} style quantifier.

    By the way, if there's any possibility that you'll be working with non-standard-ASCII character sets (ie, Unicode, etc.) all of the methods so far could break down. It's always a little risky to assume that the entire set of upper-case characters fall within the range 'A'..'Z'. In other words, none of the solutions presented so far are portable to a broad range of languages and text encoding schemes. If that is possibly an issue, look into the POSIX extensions for regexps, and consider using a named character class to represent upper case characters.

    For completeness, here are the Unicode-compliant, POSIX versions of the above regexes:

    $string =~ s/(([:upper:])\2+)/lc $1/eg; $string =~ s/([:upper:]{2,})/lc $1/eg;


    Dave