in reply to possible perl bug or, at least, Win32::[TieRegistry|Registry] bug with difficult keynames

Win32::TieRegistry only deals with 8-bit-character strings as it uses the *A APIs not the *W APIs. Win32API::Registry exposes the *W APIs, but you'll have to do some extra work to translate the information you give and get1.

Those two registry modules were written before Perl had much of a clue about Unicode. Now that Perl has settled on UTF-8 (or something very close to UTF-8, I hear) and has some decent support for it, it'd be a good idea to update Win32::TieRegistry to take advantage of it, at least optionally.

The script you wrote dies because it (MS's *A API) translates the key name as best it can into your current 8-bit locale. That is, it translates the name into something similar but not the same. So trying to open a key by the translated name finds no match. You don't check for the open failing so when you try to use the return value you get the error you showed.

- tye        

1 The *W APIs use what Microsoft calls "UNICODE", which is really fixed-width 16-bit characters somewhat like UTF-16 except, of course, for it being fixed-width. So it matches UTF-16 so long as you only use characters that can fit in 15 bits. Some will even tell you that UTF-16 is fixed-width. They are just a little confused (UTF-16 "is fixed-width" so long as you only use characters that fit within 15 bits).

If you try to use Unicode characters that require exactly 16 bits, then Microsoft's "UNICODE" will probably set the highest bit while UTF-16 would encode the character into 32 bits (I think). Unicode characters that require more than 16 bits don't fit in Microsoft's "UNICODE" format.

Microsoft also supports UTF-8 which it calls "wide character", but it mostly just lets you translate that into either ASCII or "UNICODE"; it doesn't really provide APIs that deal with it in, for example, file names or registry keys.

  • Comment on Re: possible perl bug or, at least, Win32::[TieRegistry|Registry] bug with difficult keynames (limitation)

Replies are listed 'Best First'.
Re^2: possible perl bug or, at least, Win32::[TieRegistry|Registry] bug with difficult keynames (limitation)
by perl-diddler (Chaplain) on May 15, 2005 at 19:37 UTC
    Someone else already had a version similar to mine that printed the error values. The error value didn't seem to make much sense:
    ERR: open CUser\AppEvents\Schemes\Apps\.Default\AppGPFault\current0?: +The system could not find the environment option that was entered
    The problem may be that the TieRegistry routines can't handle MS UCS-16 characters -- i.e. perhaps, somehow, such characters are being returned. I know that UTF-16 encoding doesn't require the 16th bit to be set -- you can see it when you dump a UCS-16 file -- if it was "ascii", then it has a "zero" in the high byte. It seems any value other than 0 in the high byte would indicate something other than a simple ascii char.

    In looking at my sample reg file, it looks like the "difficult" characters are simply UCS-16 encodings for \r and \n.

    Any idea of who owns "TieRegistry" or "Registry" who might update them?. I "guess"...this seems kludgey, but on NT platforms, UCS-16 encoding should be used for registry terms. This would seem to be bad if one wants to use UTF-8 locale settings as an attempt at conversion would need to be done (UCS16<->UTF8). Regardless, in non-ascii locale's (i.e. most installations), a translation would need to be attempted to USC-16 and vice-versa. Then errors would have to be returned fo 'encoding errors').

    Grumble...since any character is valid in a registry key/value name except "\", one can't just try to store user strings as binary data (might collide with a "\").

    Note -- I tried my program without the "use UTF8;" It fails as well. It's most likely the use of the "W" API's that is central to the problem.

    Thanks & thanks in advance if you know where to find the Win32 Tiereg & Registry maintainers...will try reposting this info in module-authors...

    Linda

      Please reread my note. The problem is that the *A APIs are used, RegOpenKeyExA() not RegOpenKeyExW(). It is simply a limitation that only 8-bit characters are supported by these Microsoft APIs. The problem has nothing to do with Perl or the Perl modules, except in that those modules choose to use the *A APIs. The error you get is exactly as expected given how the *A APIs work.

      Switching Win32::TieRegistry to support the *W APIs would be quite a bit of work, would add conversion overhead quite a few places, and would need to only be done optionally (due to overhead and because I don't trust Perl to prevent people from noticing that they are suddenly getting UTF-8 strings instead of 8-bit strings).

      From what you've written, it sounds like you don't have a real need for this functionality anyway, it just being a limitation that you discovered more out of curiosity than pressing need. I've seen no other requests for supporting out-of-locale characters in Win32::TieRegistry to date. That doesn't mean it shouldn't be done or that it won't be done, it just affects what priority I'm likely to assign to it.

      Win32API::Registry already supports the *W APIs, but you'll have to do some extra work, as I noted. pack and unpack should handle it, if you've got a version of Perl that supports Unicode well. Probably:

      my $utf8 = pack "U*", unpack "S*", $ucs16; my $ucs16 = pack "S*", unpack( "U*", $utf8 ), 0;

      Except you'll probably need to chop the "\0" of the end of $utf8 after that first line. But I haven't tested any of that code.

      - tye