fredvdv has asked for the wisdom of the Perl Monks concerning the following question:

I want to make all first letters of each word uppercase and the rest of the word lowercase so I use this regexp:
$s = "word1 word2 word3"; $s =~ s/\b([A-Za-z]+)\b/\u\L$1/g
Everything goes well until $s contains utf8 characters then all characters following an utf8 one is also upper case. Is there a way to make \b aware of utf8 characters ? Is there another regexp to make the same processing but without messing up strings containing utf8 ? Regards, Frederic.

Replies are listed 'Best First'.
Re: regexp on utf8 string
by tradez (Pilgrim) on Sep 10, 2004 at 15:21 UTC
    Have you tried just using the function ucfirst as in
    $newWord = ucfirst($oldword);


    Tradez
    "Never underestimate the predicability of stupidity"
    - Bullet Tooth Tony, Snatch (2001)
Re: regexp on utf8 string
by borisz (Canon) on Sep 10, 2004 at 15:23 UTC
Re: regexp on utf8 string
by davido (Cardinal) on Sep 11, 2004 at 05:02 UTC

    The good thing about functions such as ucfirst is that they're locale-friendly. You don't really have to worry about what upper-case characters map to what lower-case characters in a given character set; ucfirst knows. Try this:

    my $string = "word1 word2 word3"; $string = join ' ', map { ucfirst } split /\s+/, $string;

    It ought to do the trick. ...just one way to do it.

    A slightly more robust regexp solution might include the following:

    use strict; use warnings; my $string = "word1 word2 word3"; $string =~ s/(\w+)(?=\W|$)/ucfirst $1/eg;

    Dave

      I finally manage to do the tricks with these:

      uppercase the first letter and lowercase the rest
      $string = "\u\L$string";
      my string always contains spaces and/or dashes so I uppercase each letters which follows one or more spaces or dashes.
      $string =~ s/([\s-]+)(.)/$1\u$2/g;
      Regards, Frederic
Re: regexp on utf8 string
by trammell (Priest) on Sep 10, 2004 at 15:26 UTC
    Perhaps you could work ucfirst into your solution?