Accented letter is not capitalised

Steve_BZ has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,

I have a slightly strange issue here.

I am using a piece of code:

        substr($loc_temp_name,0,1) = uc(substr($loc_temp_name,0,1));
[download]

All the other letters are coming out capitalised nicely, but letters with accents are coming out in lower case still.

Eg Diagnosis "úlcera" (ulcera with u accented) does not have capitalised "Ú" (U with accent).

Trying to fix with:

    substr($loc_temp_name,0,1) = uc(substr(decode("utf8",$loc_temp_nam
+e),0,1));
[download]

does indeed fix it but also prints all the other accented letters with accents.

I've tried various permutations of this theme but to no avail.

Ideas gratefully received.

Regards

Steve

Comment on Accented letter is not capitalised Select or Download Code

Replies are listed 'Best First'.
Re: Accented letter is not capitalised by moritz (Cardinal) on Feb 14, 2012 at 13:01 UTC
The basic rule is that you need to decode your strings before you apply text operations (like uc) to them, and then encode them when you do IO with them (ie print to STDOUT or a file). It also seems you're coming up with an extra-complicated way of writing ucfirst. So a piece of code that handles accents in UTF-8 input correctly could look like this: `# make sure that strings coming from STDIN are decoded: binmode STDIN, ':encoding(UTF-8)'; # make sure that strings written to STDOUT are encoded: binmode STDOUT, ':encoding(UTF-8)'; my $line = <STDIN>; $line = ucfirst $line; print $line;` [download] Please read the longer introduction for more background and information. Perl 6 - second systems done right	[reply] [d/l]
Re^2: Accented letter is not capitalised by Steve_BZ (Chaplain) on Feb 16, 2012 at 20:52 UTC
Hi Moritz, Thanks for this. I feel I'm pretty much doing as you suggest. However, I tried the link you suggested and copied some code (from about the middle of the page). Changed it a little and I get the following: `#!/usr/bin/perl use warnings; use strict; use Encode qw(encode decode); my $enc = 'utf-8'; # This script is stored as UTF-8 my $str = "úlcera\n"; # Byte strings: print ucfirst $str; # prints 'úlcera', ucfirst didn't have any effect. # text strings:: my $text_str = decode($enc, $str); $text_str = ucfirst $text_str; print encode($enc, $text_str); # prints '?lcera', ucfirst as specified +.` [download] Where the capitalised "Ú" is displayed as a white square symbol under Windows or a question mark symbol under Linux. Do I need to configure anything on my machine to make this work? Regards Steve	[reply] [d/l]
Re^3: Accented letter is not capitalised by moritz (Cardinal) on Feb 16, 2012 at 21:07 UTC
Did you read the section Testing your Environment? Maybe your terminal isn't configured to use UTF-8, and your script isn't stored in UTF-8 either. I don#t think that the windows console is ever configured by default to work with UTF-8. Perl 6 - second systems done right	[reply]
Re^3: Accented letter is not capitalised by tchrist (Pilgrim) on Feb 17, 2012 at 01:04 UTC
Somebody wrote: `#!/usr/bin/perl use warnings; use strict; use Encode qw(encode decode); my $enc = 'utf-8'; # This script is stored as UTF-8 my $str = "úlcera\n"; # Byte strings: print ucfirst $str; # prints 'úlcera', ucfirst didn't have any effect.` [download] Whoa, that’s never going to work! And you should (very very very almost) not ever need to be calling `encode/decode` yourself, either. Honest, this is really very easy. Watch: `use utf8; use strict; use warnings; use warnings FATAL => "utf8"; use feature "unicode_strings"; # or use v5.12 or superior use open qw(:std :utf8); print ucfirst("úlcera\n");` [download] ...very most assuredly does indeed print out `Úlcera`. Don’t go by appearances: trust only the numbers. Thus: `$ perl ultstertest Úlcera $ perl ulstertest \| uniquote -x \x{DA}lcera $ perl ulstertest \| uniquote -v \N{LATIN CAPITAL LETTER U WITH ACUTE}lcera $ perl ulstertest \| uniquote -b \xC3\x9Alcera` [download] The outer pair of tests above risk only confusion; it is the inner pair that are wholly dispositive and convincing: trust the output of `uniquote -v` and `uniquote -x` to give you something you can actually read and depend on. Like I said, just play it by the numbers. --tom	[reply] [d/l] [select]
Re^4: Accented letter is not capitalised by Steve_BZ (Chaplain) on Feb 17, 2012 at 16:05 UTC
Re^5: Accented letter is not capitalised by tchrist (Pilgrim) on Feb 17, 2012 at 20:11 UTC