orrence has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I am using a module which parses a text file with german umlaut characters (äöü etc.). In this module the 'use locale'-pragma is used and everything works fine executing my script inside an xterm. Problems arise when I want to do the same thing in a CGI-Script (apache-1.3-server).

The script fails because the geman umlaut-characters (äöü etc.) are not part of \w in the CGI-environment, causing some regexes to fail, although I set

$ENV{LANG}='de_DE@euro'
$ENV{LC_CTYPE}='de_DE@euro'
$ENV{LC_COLLATE}='POSIX'

just like it is set in bash-environment. According to documentation of 'use locale' in the camel-book it uses the values in LC_CTYPE for regex-comparisons and in LC_COLLATE for string comparisons.

What can I do to tell apache respective CGI that I am in a german environment and want special german characters to be interpreted as alphanumeric characters?

Thanks,
Daniel.

P.S.: using perl 5.8.0 on a Linux box (SuSe 8.2, kernel 2.4.20)

Replies are listed 'Best First'.
Re: locale problems with cgi and apache
by Roger (Parson) on Nov 05, 2003 at 13:38 UTC
    Perhaps you should consider using unicode strings. See utf8 on www.cpan.org for more info, or simply google with 'perl, utf, locale, german' and you will get hundreds of hits.

Re: locale problems with cgi and apache
by chromatic (Archbishop) on Nov 06, 2003 at 07:01 UTC

    When and where do you set these variables? I assume they must be set before locale loads. If you're not setting them in the server's parent process or in httpd.conf before running your program, you might try:

    BEGIN { $ENV{LANG} = 'de_DE@euro'; $ENV{LC_CTYPE} = 'de_DE@euro'; $ENV{LC_COLLATE} = 'POSIX'; } use locale;
      Thank you for thinking about my problem. I found a solution using the module POSIX and its function setlocale like this:

      &setlocale( LC_CTYPE, 'de_DE@euro');

      Now it works the way I expect it to ...

      Daniel.