rkg has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to change words in full upper case into proper case within a string. Could someone explain the problem with this regexp?
my $a= ' I have SOME WORDS in full CAPS '; $a =~ s{\b([A-Z]{4,})\b/}{ucfirst(lc($1))}eg; print $a, "\n"; # wanted: I have Some Words in full Caps
Thank you
rkg

Replies are listed 'Best First'.
Re: Quick regexp question
by liz (Monsignor) on Oct 05, 2003 at 13:55 UTC
    $a =~ s{\b([A-Z]{4,})\b}{ucfirst(lc($1))}eg; ^^^ slash removed
    You realize of course that there many other (probably better) ways to do this. Particularly, I wonder why you would need the \b, as [A-Z]{4,} is greedy and will take whole words on a match. And then there is of course the \L escape sequence. I would probably have done it like this:
    $a =~ s#([A-Z])([A-Z]{3,})#$1\L$2\E#g;

    Liz

      If you're going to better him on his code, you might just as well do it the proper way. Eh... the shorter way. :)
      $a =~ s#([A-Z]{4,})#\u\L$1#g;
        To be honest, I didn't know you could chain \u\L that way.

        Another "nice idiom learned at the Monastery" for me today!

        Thanks, bart!

        Liz

      Without the \b, you'd change fooBARBAZ into fooBarbaz. With the \b, you'd leave it unmodified.

      Abigail

Re: Quick regexp question
by arno (Scribe) on Oct 05, 2003 at 13:56 UTC
    Hi, your code is almost perfect, you've just put a '/' at the end of your pattern :
    $a =~ s{\b([A-Z]{4,})\b}{ucfirst(lc($1))}eg;
    will work

    Arnaud
Re: Quick regexp question
by rkg (Hermit) on Oct 05, 2003 at 13:55 UTC
    do'h! a vestigal slash hidden in pattern.  s{\b([A-Z]{4,})\b}{ucfirst(lc($1))}eg; is what i want. thanks, got it.
Re: Quick regexp question
by Cody Pendant (Prior) on Oct 05, 2003 at 23:31 UTC
    Just a thought, your code is going to make things like "CIA" and "NASA" come out as "Cia" and "Nasa" -- is that OK?


    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print
      Yep, imperfect, but it is the best I can do given the large volume of text I need to process, and meets the level of accuracy of the spec.


      <humor> Maybe perl 6 will offer new flavors of \b word boundary tags which'll be smart enough to recognize the start of well-know acronyms...</humor>

        You could always define a hash of any acronyms that are likely to be there (download a list from somewhere) and put "unless defined $dontchange{$word}" before your regex.


        ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print