rodd has asked for the wisdom of the Perl Monks concerning the following question:

Howdy monks. I have this application that uses YAML::XS which has worked wonderfully well... until I hit this problem:

my @arr; push @arr, { xx=>\99 } for 1..2; my $arr2 = YAML::XS::Load( YAML::XS::Dump( \@arr ) ); $arr2->[0]->{xx}='THIS VALUE IS MEANT TO CHANGE ONE ROW ONLY'; say "NO WAY!" if $arr2->[0]->{xx} eq $arr2->[1]->{xx}; # which prin +ts NO WAY!

I don't get it. In the for loop, doesn't \99 resolve to 2 different references to 2 different values (which is 99)? So why after dumping/loading, writing to row 0 also modifies the HASH in row 1?

Actually, if you view the YAML text generated by Dump():

--- - xx: &1 !!perl/ref =: 99 - xx: *1

This tells you there's an alias (*1 ==> &1) that makes both xx attributes point to the same place. The problem is that Load() seemingly treats the xx key as the same memory location, which is not remotely a decent interpretation of the serialized data-- "hold a reference to the same data" does not mean "we're in the same address".

I wish I could switch to JSON or another YAML module (YAML::Syck, YAML etc do not have the same behavior), but right now some serious production code depends on this YAML::XS (and the speed is nice too) hence no changes can be undertaken in the near future.

So I dove into the YAML::XS C code, but could not find the culprit:

https://github.com/ingydotnet/yaml-libyaml-pm/blob/master/LibYAML/perl_libyaml.c

Rod

Replies are listed 'Best First'.
Re: YAML scalar ref strange behavior (JSON++)
by tye (Sage) on Jun 04, 2014 at 02:45 UTC

    Yeah, YAML has such an unreasonably large spec that it ends up having plenty of room for unpleasant surprises. (It also is too large of a spec for any single person to ever even come close to fully understanding it so nobody can correctly implement it so you get disagreement on even fairly simple things between implementations, IME.)

    But, no, \99 in a loop isn't guaranteed to give you references to distinct scalars, as is easily shown:

    my @x; push @x, { xx => \99 } for 1, 2; print "same ref\n" if $x[0]{xx} == $x[1]{xx};

    But YAML then makes the problem worse. We start out with two distinct scalars each holding references to the same constant. YAML turns that into the two hash values being aliases to the same scalar (holding a single reference to the scalar value of 99). Clearly, the YAML spec needs some additions so that distinct references to the same scalar can be encoded differently than a single reference being held in two aliases within the data structure. ;)

    I can't tell you how to fix the problem as you are quick to rule out the likely ways one would fix the problem so I suspect you'll likely rule out any fix I might suggest. Several alternate fixes seem easy to come up with, so my guess is you aren't considering those as you have reason to eliminate them.

    Though, I certainly encourage you to start to move to JSON (which has a spec so tiny that it borders on being trivial to understand -- other than some slight complexity around character encoding details that often just don't matter). I very much don't want my "transfer data" encoding to support the concept of aliases, which is just one of the many reasons that I like JSON.

    - tye        

      Ok, your example made me realize that the best approach for now would be generating a unique reference using... eval:

      my @arr; push @arr, { xx=>eval '\99' } for 1..2; say YAML::XS::Dump(\@arr);
      Which prints:

      --- - xx: !!perl/ref =: 99 - xx: !!perl/ref =: 99

      which in place Loads fine, working around the problem.