kartlee1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi People, I have an environment variable set in Windows as TEST=abc£ which uses Windows-1252 code page. Now when I run a perl program - 'test.pl', this environment value comes properly. When I call another perl code - 'test2.pl' from 'test1.pl' either by system(..) or Win32::process(..), the environment comes garbled. Can someone provide information why this could be and way to resolve it? The version of perl I am using is 5.8. If my understanding is right, perl internally uses 'utf-8', so the initial process - 'test1.pl' received it right from windows1252->utf-8. When we call another process, should we convert back to windows 1252 code page? -Kartlee

Replies are listed 'Best First'.
Re: Perl character encoding
by cdarke (Prior) on Mar 13, 2010 at 16:13 UTC
    I just tried this on 5.10.1 from cmd.exe and I got no difference between the two processes. Of course this is assuming that the text I see on my browser is exactly the same as you mean, since translation can occur there as well.

    The cmd.exe did not display the £ correctly, it gave:
    test1.pl: <abcú> test2.pl: <abcú>
    You do not say exactly what "garbled" is. It might help if you display the characters in hex. printf "%x", $ENV{TEST}; before and after you pass them through the environment block to see what the change really is.

    I am interested in your perception of perl translating from Windows 1252 to utf-8 and back - can you cite the reference where you read that?

      Console applications and windowed applications* use different encodings, often cp1252 and cp850. The cp1252 encoding of "£" is the same as the cp850 encoding of "ú".

      Unicode cp1252 cp850 £ U+00A3 A3=0243 C9=0311 ú U+00FA FA=0372 A3=0243

      * — I'm not sure what the actual distinction is.

      test=c:\\temp\\abc£ test.pl ------- use Devel::Peek; print Dump $ENV{"TEST"}; system("perl.exe c:\\temp\\test1.pl"); test1.pl -------- use Devel::Peek; print Dump $ENV{"TEST"}; Running test.pl returns: SV = PVMG(0x1a22dcc) at 0x356fd4 REFCNT = 1 FLAGS = (SMG,RMG,POK,pPOK) IV = 0 NV = 0 PV = 0x1a2f4a4 "c:\\temp\\abc\243"\0 CUR = 12 LEN = 16 MAGIC = 0x1a2f4cc MG_VIRTUAL = &PL_vtbl_envelem MG_TYPE = PERL_MAGIC_envelem(e) MG_LEN = 17 MG_PTR = 0x1a2f4fc "TEST" SV = PVMG(0x1a22dcc) at 0x356fd4 REFCNT = 1 FLAGS = (SMG,RMG,POK,pPOK) IV = 0 NV = 0 PV = 0x1a2f4bc "c:\\temp\\abc\234"\0 CUR = 12 LEN = 16 MAGIC = 0x1a2f4e4 MG_VIRTUAL = &PL_vtbl_envelem MG_TYPE = PERL_MAGIC_envelem(e) MG_LEN = 17 MG_PTR = 0x1a2f514 "TEST"
      -Kartlee

        Both contain the same still-encoded string. Your test doesn't demonstrate the corruption you mentioned.

        By the way, print Dump should be just Dump

Re: Perl character encoding
by BrowserUk (Patriarch) on Mar 13, 2010 at 19:52 UTC

    Intriguing. I see a similar difference:

    C:\test>set TEST=abc£ C:\test>echo %TEST% abc£ perl -MWin32::Console -E"say '1:', $ENV{TEST};say Win32::Console::Outp +utCP; system q[perl -MWin32::Console -E\"say '2:', $ENV{TEST}; say Win32::Console::OutputCP;\"]" 1:abcú 850 2:abc£ 850

    And the difference seems to be internal to Perl somehow.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl character encoding
by youlose (Scribe) on Mar 15, 2010 at 10:59 UTC
    you need to: 1. decode string from CP1252 2. encode it to CP850