perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

I was told a bit ago, after posting this on ActiveState's Perl list and on the perl-users list that I might bring this problem to the perl monks, as they may have greatly increased wisdom and insight into rooting out this problem.

I wanted to use perl to manipulate the registry on Windows XP. According to Microsoft's website "Key names are not case sensitive. Key names cannot include a backslash (\), but any other printable or unprintable character can be used."
(http://msdn.microsoft.com/library/en-us/sysinfo/base/regcreatekey.asp).

Depending on the locale in effect at the time you create the key, it seems virtually any keyname is possible. This also supports the creation of multi-byte characterset keys and keynames or valuenames with unprintable characters.

I wrote a simple routine to "iterate" all the keynames and their values. This worked fine for the HKEY_CLASSES sub-key. However for HKEY_CURRENT_USER (me), Microsoft's Sound-font selector for system actions had created some odd keynames.

Now one might say "oh, just delete those keys, they are of no use, anyway", but, I answer -- they are of excellent use: they point out a bug in the interface.

To date, no one know's exactly where the bug is -- if it is in the Win32 routines or if it is in Perl. I'm thinking there is a 30% chance it's in Perl and not specific but I don't know enough perl to verify this.

I do know that others rewrote my iteration routine and one person tried the older ("deprecated") Win32::Registry functions and none were able to iterate the suspect keys.

I wrote a simple test-case registry file that one can use to create the problem keynames. NOTE: this example file won't harm one's registry and can easily be deleted in the Registry Editory.

--- Reg5 file to create sample keys: --- Windows Registry Editor Version 5.00 [HKEY_CURRENT_USER\_ATEST_\.Default\Curren0] @="myvalue1" [HKEY_CURRENT_USER\_ATEST_\.Default\current0?] @="myvalue2" [HKEY_CURRENT_USER\_ATEST_\.Default\current0肼] @="myvalue3" [HKEY_CURRENT_USER\_ATEST_\AppGPFault\Curren0] @="myvalue4" [HKEY_CURRENT_USER\_ATEST_\AppGPFault\current0肼] @="myvalue5" -------------
Just to "get it right" in case it doesn't display/store or copy correctly above, here is the same file encoded using uuencode:
begin 640 reg5-format.reg M__Y7`&D`;@!D`&\`=P!S`"``4@!E`&<`:0!S`'0`<@!Y`"``10!D`&D`=`!O M`'(`(`!6`&4`<@!S`&D`;P!N`"``-0`N`#``,``-``H`#0`*`%L`2`!+`$4` M60!?`$,`50!2`%(`10!.`%0`7P!5`%,`10!2`%P`7P!!`%0`10!3`%0`7P!< M`"X`1`!E`&8`80!U`&P`=`!<`$,`=0!R`'(`90!N`#``!@!=``T`"@!``#T` M(@!M`'D`=@!A`&P`=0!E`#$`(@`-``H`#0`*`%L`2`!+`$4`60!?`$,`50!2 M`%(`10!.`%0`7P!5`%,`10!2`%P`7P!!`%0`10!3`%0`7P!<`"X`1`!E`&8` M80!U`&P`=`!<`&,`=0!R`'(`90!N`'0`,``_`%T`#0`*`$``/0`B`&T`>0!V M`&$`;`!U`&4`,@`B``T`"@`-``H`6P!(`$L`10!9`%\`0P!5`%(`4@!%`$X` M5`!?`%4`4P!%`%(`7`!?`$$`5`!%`%,`5`!?`%P`+@!$`&4`9@!A`'4`;`!T M`%P`8P!U`'(`<@!E`&X`=``P`+R`70`-``H`0``]`"(`;0!Y`'8`80!L`'4` M90`S`"(`#0`*``T`"@!;`$@`2P!%`%D`7P!#`%4`4@!2`$4`3@!4`%\`50!3 M`$4`4@!<`%\`00!4`$4`4P!4`%\`7`!!`'``<`!'`%``1@!A`'4`;`!T`%P` M0P!U`'(`<@!E`&X`,``&`%T`#0`*`$``/0`B`&T`>0!V`&$`;`!U`&4`-``B M``T`"@`-``H`6P!(`$L`10!9`%\`0P!5`%(`4@!%`$X`5`!?`%4`4P!%`%(` M7`!?`$$`5`!%`%,`5`!?`%P`00!P`'``1P!0`$8`80!U`&P`=`!<`&,`=0!R M`'(`90!N`'0`,`"\@%T`#0`*`$``/0`B`&T`>0!V`&$`;`!U`&4`-0`B``T` &"@`-``H` ` end
------- here is the perl program to list the above key:
#!/perl/bin/perl -w use UTF8; use Win32::TieRegistry 0.24 ; #"Classes" for HKEY_CLASSES_ROOT #"CUser" for HKEY_CURRENT_USER #"LMachine" for HKEY_LOCAL_MACHINE #"Users" for HKEY_USERS #"CConfig" for HKEY_CURRENT_CONFIG select STDERR;$|=1; select STDOUT;$|=1; my $col=1; $keyname='CUser\\_ATEST_'; print_key($Registry->{$keyname},$keyname,0); sub print_key { my ($cur_key, $keyname, $level)=@_; if ($col) { print "\n"; $col=0; } print " " x $level; $col=2; print "$keyname\\";my $nospaces=1; foreach $member ($cur_key->MemberNames) { if ($member =~ m|^(.+)\\$|) { $keyname=$1; if ($col) { print "\n"; $col=0; } $nospaces=0; print_key($cur_key->{"$keyname\\"}, "$keyname",$level+1); } elsif ($member =~ m|^\\(.*)$|) { $valuename=$1; $value=$cur_key->GetValue("$valuename"); $value=(defined $value)?$value:"(value not set)"; $value=(length($value))?$value:"(null)"; if (!$nospaces) {print " " x ($level+1); $col+=2;}; $nospaces=0; print "$valuename => $value"; if ($col) { print "\n"; $col=0; } } } }

----- The above program dies due to the error. Here is the "broken" output:
CUser\_ATEST_\ .Default\ Curren0&#9824;\ => myvalue1 current0?\ => myvalue2 current0?\ => myvalue2 AppGPFault\ Curren0&#9824;\ => myvalue4 current0?\Can't call method "MemberNames" on an undefined value at + ./test.pl line 34.

As it doesn't seem specific to TieRegistry and is in the Registry function of Win32 as well, the problem would seem to lie either in "Win32" or in perl. As I mentioned before, I'd give it about 2:1 odds of being in perl but I've no idea how Win32 works.

As it stands now, I have this "vague" uneasy feeling about how robust perl is in handling unexpected sequences in string data.

Many thanks for anyone who can isolate this problem....

Linda

  • Comment on possible perl bug or, at least, Win32::[TieRegistry|Registry] bug with difficult keynames
  • Select or Download Code

Replies are listed 'Best First'.
Re: possible perl bug or, at least, Win32::[TieRegistry|Registry] bug with difficult keynames (limitation)
by tye (Sage) on May 12, 2005 at 03:25 UTC

    Win32::TieRegistry only deals with 8-bit-character strings as it uses the *A APIs not the *W APIs. Win32API::Registry exposes the *W APIs, but you'll have to do some extra work to translate the information you give and get1.

    Those two registry modules were written before Perl had much of a clue about Unicode. Now that Perl has settled on UTF-8 (or something very close to UTF-8, I hear) and has some decent support for it, it'd be a good idea to update Win32::TieRegistry to take advantage of it, at least optionally.

    The script you wrote dies because it (MS's *A API) translates the key name as best it can into your current 8-bit locale. That is, it translates the name into something similar but not the same. So trying to open a key by the translated name finds no match. You don't check for the open failing so when you try to use the return value you get the error you showed.

    - tye        

    1 The *W APIs use what Microsoft calls "UNICODE", which is really fixed-width 16-bit characters somewhat like UTF-16 except, of course, for it being fixed-width. So it matches UTF-16 so long as you only use characters that can fit in 15 bits. Some will even tell you that UTF-16 is fixed-width. They are just a little confused (UTF-16 "is fixed-width" so long as you only use characters that fit within 15 bits).

    If you try to use Unicode characters that require exactly 16 bits, then Microsoft's "UNICODE" will probably set the highest bit while UTF-16 would encode the character into 32 bits (I think). Unicode characters that require more than 16 bits don't fit in Microsoft's "UNICODE" format.

    Microsoft also supports UTF-8 which it calls "wide character", but it mostly just lets you translate that into either ASCII or "UNICODE"; it doesn't really provide APIs that deal with it in, for example, file names or registry keys.

      Someone else already had a version similar to mine that printed the error values. The error value didn't seem to make much sense:
      ERR: open CUser\AppEvents\Schemes\Apps\.Default\AppGPFault\current0?: +The system could not find the environment option that was entered
      The problem may be that the TieRegistry routines can't handle MS UCS-16 characters -- i.e. perhaps, somehow, such characters are being returned. I know that UTF-16 encoding doesn't require the 16th bit to be set -- you can see it when you dump a UCS-16 file -- if it was "ascii", then it has a "zero" in the high byte. It seems any value other than 0 in the high byte would indicate something other than a simple ascii char.

      In looking at my sample reg file, it looks like the "difficult" characters are simply UCS-16 encodings for \r and \n.

      Any idea of who owns "TieRegistry" or "Registry" who might update them?. I "guess"...this seems kludgey, but on NT platforms, UCS-16 encoding should be used for registry terms. This would seem to be bad if one wants to use UTF-8 locale settings as an attempt at conversion would need to be done (UCS16<->UTF8). Regardless, in non-ascii locale's (i.e. most installations), a translation would need to be attempted to USC-16 and vice-versa. Then errors would have to be returned fo 'encoding errors').

      Grumble...since any character is valid in a registry key/value name except "\", one can't just try to store user strings as binary data (might collide with a "\").

      Note -- I tried my program without the "use UTF8;" It fails as well. It's most likely the use of the "W" API's that is central to the problem.

      Thanks & thanks in advance if you know where to find the Win32 Tiereg & Registry maintainers...will try reposting this info in module-authors...

      Linda

        Please reread my note. The problem is that the *A APIs are used, RegOpenKeyExA() not RegOpenKeyExW(). It is simply a limitation that only 8-bit characters are supported by these Microsoft APIs. The problem has nothing to do with Perl or the Perl modules, except in that those modules choose to use the *A APIs. The error you get is exactly as expected given how the *A APIs work.

        Switching Win32::TieRegistry to support the *W APIs would be quite a bit of work, would add conversion overhead quite a few places, and would need to only be done optionally (due to overhead and because I don't trust Perl to prevent people from noticing that they are suddenly getting UTF-8 strings instead of 8-bit strings).

        From what you've written, it sounds like you don't have a real need for this functionality anyway, it just being a limitation that you discovered more out of curiosity than pressing need. I've seen no other requests for supporting out-of-locale characters in Win32::TieRegistry to date. That doesn't mean it shouldn't be done or that it won't be done, it just affects what priority I'm likely to assign to it.

        Win32API::Registry already supports the *W APIs, but you'll have to do some extra work, as I noted. pack and unpack should handle it, if you've got a version of Perl that supports Unicode well. Probably:

        my $utf8 = pack "U*", unpack "S*", $ucs16; my $ucs16 = pack "S*", unpack( "U*", $utf8 ), 0;

        Except you'll probably need to chop the "\0" of the end of $utf8 after that first line. But I haven't tested any of that code.

        - tye