gflohr has asked for the wisdom of the Perl Monks concerning the following question:

The Microsoft documentation for setlocale() suggests that - since 2018 - they support UTF-8 locales and short (Unix-style) locale identifiers like "en_US". However, I cannot get that to work, not even with Microsoft's own example code. The linked page also contains the up-to-date documentation for setlocale() on Windows.

I'm running a recent Windows 10 system inside VirtualBox. I have tried to switch the locale with (Strawberry) Perl and two versions of MinGW gcc. The result is always the same. I can set the locale to something like "German_Germany.1252" but none of "de_DE", "de-DE", "German_Germany.UTF-8", ".UTF8", "German_Germany.65001", ... work. 65001 is the "code page" for utf-8.

Can anybody shed some light on this?

I'm using this C code for testing (run with "PROGRAMNAME LOCALE"):

#include <locale.h> #include <stdio.h> #include <time.h> #include <string.h> int main(int argc, char *argv[]) { char date[256]; time_t then = 1678658400; const char *locale = setlocale(LC_TIME, argv[1]); strftime(date, sizeof date, "%B", localtime(&then)); printf("%s: %s (%u bytes)\n", locale, date, strlen(date)); return 0; }

Or in Perl

use v5.10; use POSIX qw(LC_ALL setlocale strftime); my $locale = $ARGV[0]; say "$locale: ", POSIX::setlocale(LC_ALL, $locale); my $march = strftime("%B", 0, 0, 0, 1, 2, 123); say $march;

On non-Windows systems, both versions invoked as "./PROGRAM de_DE.UTF-8" spit out the German word for the month march "März" in utf-8.

Replies are listed 'Best First'.
Re: Using setlocale() on Windows with utf-8 support
by syphilis (Archbishop) on Jul 18, 2023 at 01:21 UTC
    my $locale = $ARGV[1];

    AIUI, you'll want $ARGV[0] not $ARGV[1].
    I've also added strictures and warnings to your script (to trigger any helpful diagnostics that might be lurking):
    use strict; use warnings; use v5.10; use POSIX qw(LC_ALL setlocale strftime); my $locale = $ARGV[0]; say "$locale: ", POSIX::setlocale(LC_ALL, $locale); my $march = strftime("%B", 0, 0, 0, 1, 2, 123); say $march;
    On Windows 11, perl-5.38.0, I then get:
    D:\pscrpt>perl try.pl de_DE.UTF-8 Use of uninitialized value in say at try.pl line 7. de_DE.UTF-8: March
    It seems that the POSIX::setlocale() call is not returning anything.
    The POSIX documentation suggests that the following one liner should work:
    D:\>perl -MPOSIX -wle "$loc = POSIX::setlocale( &POSIX::LC_ALL, 'de' ) +; print $loc;" Use of uninitialized value $loc in print at -e line 1.
    That's not my idea of "working".
    I would raise an issue about this at https://github.com/Perl/perl5/issues.
    At least then you'll get feedback from people with some expertise regarding locale settings on Windows.

    BTW, for me, the C program you provided output:
    C: March (5 bytes)
    Cheers,
    Rob
Re: Using setlocale() on Windows with utf-8 support
by gflohr (Acolyte) on Jul 18, 2023 at 10:00 UTC

    Rob, locale identifiers are platform-dependent. The id "de" probably works nowhere, "de_DE" will work on most Unix systems, if the locale "de_DE" is installed. On Windows you have to use "German" or "German_Germany" which is the same. But all that is clear and not the question.

    The question was: Is it possible to activate a UTF-8 locale on Windows? The Microsoft documentation claims that it is possible but how can that feature be used from Perl?

      The question was: Is it possible to activate a UTF-8 locale on Windows?

      It looks like it's probably working for me on Windows 11, but only if the C toolchain that built perl (or that builds your executable) is a Microsoft one.
      Here's a copy'n'paste (that doesn't render exactly as it appears) of what I get, having built your demo C program (into try.exe) using Visual Studio 2022:
      D:\C>try.exe German.utf8 German_Germany.utf8: M&#9500;ñrz (5 bytes)
      And here's what I get using perl-5.38.0 that was built with the same Visual Studio 2022 compiler:
      D:\>perl -MPOSIX -wle "$loc = POSIX::setlocale( LC_ALL, 'German.utf8' +); print $loc;" German_Germany.utf8
      But if I use my perl-5.38.0 that was built with a mingw-w64 port of gcc-13.1.0, then I get:
      D:\>perl -MPOSIX -wle "$loc = POSIX::setlocale( LC_ALL, 'German.utf8' +); print $loc;" Use of uninitialized value $loc in print at -e line 1.
      And if I use that gcc-13.1.0 to build your C program into try_gcc.exe, then I get:
      D:\C>try_gcc.exe German.utf8 (null): March (5 bytes)
      From which I deduce that the behavior you need has not yet been ported to the mingw-w64 toolchain.
      If you need it to work with the mingw-w64 compilers then you could make enquiries about that by (eg) posting to mingw-w64-public@lists.sourceforge.net .

      Cheers,
      Rob

        That was very helpful! Thanks!

        I will have a look at the mingw-64 sources and maybe file an issue.

        Cheers,
        Guido