waynerasm has asked for the wisdom of the Perl Monks concerning the following question:

I am using Perl 5.22.1 (Ubuntu 16.04 LTS) and am trying to understand why referencing a hash value changes the internal type of the value and how to identify what the type is in the first place. By way of example:


perl -e'use JSON::XS; %x=(a=>1);print $x{a},"~",encode_json(\%x),"\n"'

outputs 1~{"a":1}


perl -e'use JSON::XS; %x=(a=>"1"); print $x{a},"~",encode_json(\%x),"\n"'

outputs 1~{"a":"1"}


However,


perl -e'use JSON::XS; %x=(a=>1); print length($x{a}),"~",encode_json(\%x),"\n"'

outputs 1~{"a":"1"} NB {a=>1} at start


yet ostensibly no explicit change was made to $x{a} further more the reverse does not apply i.e. I can't change it back via reference only; such as print length($x{a}),$x{a}+0,... does not lead to encode_json(\%x) returning {"a":1}

So BUG, lack of understanding on my part? And how does one determin that $x{a} is a number verses a string?

As always, thanks in advance for any understanding shared.
For those that ask why do you want to know, becasue I am writting JSON to a field in MySQL Database which will be consumed via R (amongst other avenues) and in R "1"+1 -> ERROR. I need to ensure the JSON value pairs are consistent so it appears I need to be very careful (or use another version or Perl?) between generating the Hash and encoding it before writting to the Database.

Replies are listed 'Best First'.
Re: Hash value typing
by haukex (Archbishop) on Jun 10, 2018 at 10:12 UTC
Re: Hash value typing
by Anonymous Monk on Jun 10, 2018 at 10:17 UTC

    So BUG, lack of understanding on my part? And how does one determin that $x{a} is a number verses a string?

    It guesses by checking the flags. See perlnumber, Devel::Peek

    Thats the deal with JSON, either the encoders do their best to guess, or you find an encoder that makes you tell it what to use (JSON::Schema?)... but the receiver shouldn't give a flying fig what type you send, it should convert to what it wants when its validating what you send

      Thank you!
      Now that just leaves me with one more question? Why do the Perl developers (or others for that matter) consider it OK to change the internal respresentation of data (albeit the narrow context of Hash values) when it is referenced? i.e from one reference to another Devel::Peek may (because it depends on the context of the reference) give you different results without you ever having consciously changed the data itself! My guess is that, by way of my example, once the conversion of 1 from binary (32|64 bit integer) to character "1" happens (such as the length function would trigger) someone decided it was likely to happen again so save a few cycles and replace the binary with the character string - nobody's looking or will know right - however, I have just shown one case it matters!
      Again thanks. As the wise old doctor said when told "Doctor when I do this it hurts!", the reply "Then don't do that!". Sort of like '\' looks reasonable but it doesn't do what you(well more specifically I) expect(before the pain reminds me not to do that)!

        Why do the Perl developers (or others for that matter) consider it OK to change the internal respresentation of data (albeit the narrow context of Hash values) when it is referenced?

        Because that's what loose typing is all about - and perl works that way, sorry. In your example, you did not reference/dereference anything, but coerced the numeric value to a string, by treating it as a string (taking the length from it). Instead of creating a temporary variable as a typecast copy of the number value, perl stores that string value in the PV slot of the variable. The NV (numeric value) and/or IV (integer value) slots of the variables remain unaffected.

        Apparently, for the JSON::XS code the PV slot has priority over the NV and IV slots. If you want numbers to remain numbers in your JSON output, you would need to inspect the values and numify them, e.g.

        #!/usr/bin/perl %x=(a=>1); print length($x{a}),"~", encode_json( { map { $_ => $x{$_} =~ /^[\d.]+$/ ? 0+$x{$_} : $_ } keys %x } ),"\n"; __END__ 1~{"a":1}
        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        Why do the Perl developers (or others for that matter) consider it OK to change the internal respresentation of data (albeit the narrow context of Hash values) when it is referenced?

        Well, to be fair, it's the internal representation. (BTW, it's not just hash values, scalars in general work this way.) All you see happening here is, as you said, Perl internally "caching" the results of its automatic and usually transparent conversion between strings and numbers - a major feature of the language. To a Perl programmer, 123 and "123" are (almost) always the same thing, and that's A Good Thing ;-)

        I have just shown one case it matters!

        Yes, one downside of the transparent conversion between strings and numbers inside Perl is that libraries that convert from Perl's data format to another have this issue to overcome (Data::Dumper, Data::Dump, the various JSON modules, ...). Some choose to always output it in one format, others use heuristics to decide whether something looks like a string or a number, and yet others try to look at Perl's internal representation, which as you've seen can change very easily. All of these ways have their flaws.

        References:

        In Perl, figuring out the difference between 42 and "42" is nontrivial (and luckily, a Perl programmer usually doesn't have to care). If the consumer of Perl's output does happen to care, it's usually best to have some way to explicitly output one or the other.