gweekly has asked for the wisdom of the Perl Monks concerning the following question:

There is a BUG in the Windows handling of the %ENV hash since perl-5.18
Specifically if a UTF8 variable is used to delete an entry
an aberrant entry appears in the hash.
Is this bug known?
Can you help find a fix?
Or workaround(s)?
Thank you!
# -*- perl -*- # badperl 25-Jan-2017 11:04 gweekly # This script attempts to expose the bug that if a utf8 variable is # used as the key to delete a hash entry (that doesn't already exist) # a non-deletable key comes into existence, in violation of the # documentation for delete that says "Setting a hash element to the # undefined value does not remove its key, but deleting it does..." # It also shows that something is definitely wrong, as the hash can # end up with two identical keys. # The DATA area shows the results for running this script, making it # evident the bug was introduced in perl-5.18 and has persisted since. use strict; use warnings; use Devel::Peek; # Dump use Encode; # decode print "Running Perl $]\n"; # Pre-condition if ( exists $ENV{SAM} ) { print "Note: removing SAM=$ENV{SAM} from the environment\n"; delete $ENV{SAM}; } my $utf8 = Encode::decode('utf8','SAM'); my $fixed = substr $utf8, 0; # Confirm $utf8 and $fixed both are eq 'SAM'; die 'utf8 is not SAM' if $utf8 ne 'SAM'; die 'fixed is not SAM' if $fixed ne 'SAM'; die 'fixed is not utf8' if $fixed ne $utf8; print "Deleting ENV{\$utf8} where utf8 eq SAM\n"; delete $ENV{$utf8}; # Here badness happens if ( defined $ENV{SAM} ) { die "WRONG: ENV{SAM} is defined: '$ENV{SAM}'\n"; } if ( defined $ENV{$utf8} ) { die "WRONG: ENV{\$utf8} is defined: '$ENV{$utf8}'\n"; } if ( exists $ENV{SAM} ) { die "WRONG: ENV{SAM} exists\n"; } print exists $ENV{$utf8} ? "WRONG: ENV{\$utf8} exists\n" : "OKAY: ENV{\$utf8} does not exist\n"; if ( my @sams = grep {$_ eq 'SAM'} keys %ENV ) { if ( @sams > 1 ) { die "Surpise: ENV has ". @sams ." SAM keys: @sams\n"; } else { print "WRONG: ENV has the key 'SAM' - @sams\n"; } if ( exists $ENV{$fixed} ) { die "Surprise: \$ENV{\$fixed} DOES exist\n"; } } print "Now, assign a new value:\n"; $ENV{$utf8}='newVal'; if ( ! defined $ENV{SAM} ) { die " ENV{SAM} is not defined\n"; } print defined $ENV{$utf8} ? "OKAY: ENV{\$utf8} is defined: '$ENV{$utf8 +}'\n" : "OKAY: ENV{\$utf8} is not defined\n"; if ( ! exists $ENV{SAM} ) { die " ENV{SAM} does not exist\n"; } print exists $ENV{$utf8} ? "OKAY: ENV{\$utf8} exists\n" : "OKAY: ENV{\$utf8} does not exist\n"; my @sams = grep {$_ eq 'SAM'} keys %ENV or die "No SAM keys?"; if ( @sams > 1 ) { print "WRONG: ENV has ". @sams ." SAM keys: @sams\n"; for ( @sams ) { Dump $_; my $ans = $ENV{$_} || '<undef>'; print " Value = $ans\n"; } } if ( ! exists $ENV{$fixed} ) { die "Surprise: \$ENV{\$fixed} does NOT exist\n"; } print "Now, delete the entry\n"; delete $ENV{$fixed}; if ( exists $ENV{$fixed} ) { die "Surprise: \$ENV{\$fixed} does exist\n"; } print "Done testing\n"; __END__ C:\Users\gweekly>c:\perl\bin\perl i:\bin\badperl Running Perl 5.008008 Deleting ENV{$utf8} where utf8 eq SAM OKAY: ENV{$utf8} does not exist Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists Now, delete the entry Done testing C:\Users\gweekly>c:\ActivePerl-5.14.2\bin\perl i:\bin\badperl Running Perl 5.014002 Deleting ENV{$utf8} where utf8 eq SAM OKAY: ENV{$utf8} does not exist Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists Now, delete the entry Done testing C:\Users\gweekly>c:\ActivePerl5.16.3\bin\perl i:\bin\badperl Running Perl 5.016003 Deleting ENV{$utf8} where utf8 eq SAM OKAY: ENV{$utf8} does not exist Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists Now, delete the entry Done testing C:\Users\gweekly>c:\ActivePerl5.18.1\bin\perl i:\bin\badperl Running Perl 5.018001 Deleting ENV{$utf8} where utf8 eq SAM WRONG: ENV{$utf8} exists WRONG: ENV has the key 'SAM' - SAM Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists WRONG: ENV has 2 SAM keys: SAM SAM SV = PV(0x2890b2c) at 0x5015b4 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x288ea78 "SAM" [UTF8 "SAM"] CUR = 3 LEN = 0 Value = newVal SV = PV(0x2890b24) at 0x5015cc REFCNT = 2 FLAGS = (POK,IsCOW,pPOK) PV = 0x288e988 "SAM" CUR = 3 LEN = 0 Value = newVal Now, delete the entry Done testing C:\Users\gweekly>\\mathworks\hub\win64\apps\bat\perl\latest520\bin\per +l i:\bin\badperl Running Perl 5.020002 Deleting ENV{$utf8} where utf8 eq SAM WRONG: ENV{$utf8} exists WRONG: ENV has the key 'SAM' - SAM Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists WRONG: ENV has 2 SAM keys: SAM SAM SV = PV(0x23556c8) at 0x3f1970 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK) PV = 0x2361e78 "SAM" CUR = 3 LEN = 0 Value = newVal SV = PV(0x23556b8) at 0x3f1910 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x2361f68 "SAM" [UTF8 "SAM"] CUR = 3 LEN = 0 Value = newVal Now, delete the entry Done testing C:\Users\gweekly>c:\Strawberry-perl-5.20.2.1\perl\bin\perl i:\bin\badp +erl Running Perl 5.020002 Deleting ENV{$utf8} where utf8 eq SAM WRONG: ENV{$utf8} exists WRONG: ENV has the key 'SAM' - SAM Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists WRONG: ENV has 2 SAM keys: SAM SAM SV = PV(0x25562b8) at 0x5dd698 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK) PV = 0x2527e58 "SAM" CUR = 3 LEN = 0 Value = newVal SV = PV(0x25562a8) at 0x5dd6c8 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x2527db8 "SAM" [UTF8 "SAM"] CUR = 3 LEN = 0 Value = newVal Now, delete the entry Done testing C:\Users\gweekly>c:\perl64\bin\perl i:\bin\badperl Running Perl 5.024001 Deleting ENV{$utf8} where utf8 eq SAM WRONG: ENV{$utf8} exists WRONG: ENV has the key 'SAM' - SAM Now, assign a new value: OKAY: ENV{$utf8} is defined: 'newVal' OKAY: ENV{$utf8} exists WRONG: ENV has 2 SAM keys: SAM SAM SV = PV(0x3ed3d8) at 0x30f488 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x30b8b8 "SAM" [UTF8 "SAM"] CUR = 3 LEN = 0 Value = newVal SV = PV(0x3ed3e8) at 0x30f470 REFCNT = 2 FLAGS = (POK,IsCOW,pPOK) PV = 0x23bc458 "SAM" CUR = 3 LEN = 0 Value = newVal Now, delete the entry Done testing

Replies are listed 'Best First'.
Re: Windows %ENV and utf8
by Athanasius (Archbishop) on Jan 31, 2017 at 04:53 UTC

    Hello gweekly,

    [Cross-reference: LibXML and %ENV in 5.18 on Windows.]

    Received in a /msg:

    Should I do anything more in hopes of getting someone to look deeper into debugging this (eg: how to contact the code author, or submit a bug report?)?

    If the line my $utf8 = Encode::decode('utf8', 'SAM'); is replaced with my $utf8 = 'SAM';, then the problem goes away. So, at first sight it would appear that the bug is in the Encode module. However, on closer inspection it becomes clear that the problem derives from the way Perl itself handles the built-in %ENV hash.1 So my best advice is to submit a bug report re Perl itself. Follow the directions in perlbug and include a reference to this PerlMonks thread: http://perlmonks.pair.com/index.pl?node_id=1180341.

    Hope that helps,


    1Of possible relevance is the “Defined values stored in environment are forced to byte strings” note in perl5180delta:

    A value stored in an environment variable has always been stringified when inherited by child processes.

    In this release, when assigning to %ENV, values are immediately stringified, and converted to be only a byte string.

    However, this mentions %ENV values only, whereas the bug pertains to utf8-encoded keys.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      If I can add something i remeber tye explaining me that on windows %ENV keys are case retentive: this was made because on such OS ENV is implemented in this way so %ENV reflect the behaviour.

      Consider the following two examples:

      set foo=bar & set FOO=BAR & set FOO foo=BAR set FOO=BAR & set FOO=bar & set FOO foo=bar

      So for sure also %ENV keys (not just values) must be treated as special, very funny and tricky, strings. In short: do not do it! keep %ENV keys uppercase ASCII plain strings and values as close as possile to a plain ASCII string

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Thank you. Submitted https://rt.perl.org/Ticket/Display.html?id=130683