leeericsson has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I met a problem of string encoding in perl. what i want to do is to read a string from Windows command line and then write it to a file, which is UTF8 encoded. The string I will have to read from the command line could be different languages, for example English, Chinese or German. as I know the commadline string is encoded according to my host local, for example gb2312, i can decode the ARGV[0] with gb2312, and then encode it with UTF8, before i write it to the file. this works fine for english and chinese, but not for languages like German. how can i do the same thing with those languages? without user input a encoding of the language used. Thanks and Regards! Lee

Replies are listed 'Best First'.
Re: Encoding problem
by moritz (Cardinal) on Jan 12, 2010 at 11:49 UTC
    Encodings and Languages are relatively independent. If you have an "universal" encoding like UTF-8, it can encode characters from all languages (provided they are in the Unicode character repertoire, which is the case for basically all living and many dead languages). And you don't have to special-case any language.

    I don't know exactly what characters are in gb2312, but if you want to use some characters that are not in it, you have to use a different encoding.

    If you use UTF-8 for English and German, you should be fine (but your program has to know this, of course).

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Encoding problem
by desemondo (Hermit) on Jan 12, 2010 at 10:10 UTC
      Hi, Here are my codes ############################
      use Encode;
      $text = @ARGV[0];
      #here i read a string from command line
      $string = decode("gb2312",$text);
      #decode the input with gb2312, which is decided by my local setting
      @chars = split //,$string;
      open(OUTPUT,">>anUTF8File.txt");
      foreach $char (@chars)
      {
      print OUTPUT encode("utf8",$char);
      #i encode it into utf8 here
      #and write it to the utf8 formatted file
      }
      close(OUTPUT);
      ############################
      when I input english string in the command, this works fine. then i changed my keyboard to German type and type some German characters for the ARGV[0], then what was written to the file were some strange characters like "?" or others.
        The problem is with the program that launches perl (and populates @ARGV) and displays the output (?????). On win32 this is usually cmd.exe, and you have to configure cmd.exe separately to accept German type input ( chcp ) and separately to display German properly (font settings).