vr has asked for the wisdom of the Perl Monks concerning the following question:

use strict; use warnings; use feature 'say'; use Sereal qw/ encode_sereal decode_sereal /; use Storable qw/ freeze thaw /; say 'Storable: ', ${ thaw freeze \substr 'abc', 0 }; say 'Sereal: ', ${ decode_sereal encode_sereal \substr 'abc', 0 }; __END__ >perl 180524.pl Use of uninitialized value in say at 180524.pl line 8. Storable: Sereal: abc

I was passing lists of string references to mce_map for parallel processing. Because strings are substrings of larger strings, I thought to take shortcut by not creating temporary scalars, but to pass references of substr return values directly. Rather, refs are collected in large array (why whould I store duplicates?), then this array becomes parameter for mce_map. I should have known they are not references to scalars, but to lvalues, though.

Everything works fine until it doesn't, on a particular machine -- without, it turns out, Sereal installed. Storable, used by MCE in lieu of Sereal, treats these lvalue refs differently.

I didn't find anything googling for "Perl Storable lvalue", etc. Apart from my sloppy coding, I'm not even sure, if it's Storable issue, Sereal issue, or something else (maybe, MCE should check these?). Kind of, Sereal can be faster drop-in replacement for Storable. Except, it looks, not always. Any thoughts?

Replies are listed 'Best First'.
Re: Subtle(?) issue(?) with lvalues and serialization
by Eily (Monsignor) on May 24, 2018 at 16:40 UTC

    Oh, fun! I didn't know about LVALUE references. For the curious:

    perl -E "$str = 'Hi World!'; $sub = \substr $str, 0, 2; $$sub = 'Hello +'; say $str; say ref $sub" Hello World! LVALUE

    So the LVALUE is the magic behind how substr works, so that the output can walk and quack like a string scalar, except that changing its value will partially modify the content of another scalar.

    Now, there's a simple test to see which of Sereal and Storable does the correct thing. After the data as been serialized and deserialized, it should behave like the original data. For example:

    use feature 'say'; use Data::Dump qw( pp ); use Storable qw/ freeze thaw /; my $array = [0]; push @$array, \$array->[0]; my $copy = thaw freeze $array; $array->[0]++; $copy->[0]++; say "Array:"; say join ", ", map pp($_), @$array; say pp $array; say "\nCopy:"; say join ", ", map pp($_), @$copy; say pp $copy; __END__ Array: 1, \1 do { my $a = [1, 'fix']; $a->[1] = \$a->[0]; $a; } Copy: 1, \1 do { my $a = [1, 'fix']; $a->[1] = \$a->[0]; $a; }
    So references to elements of a structure should turn into references to the clone element in the clone structure.

    Now let's try with substr:

    my $struct = ["Hi perlmonks"]; push @$struct, \substr($struct->[0], 0, 2); my $storable = thaw freeze $struct; my $sereal = decode_sereal encode_sereal $struct; pp $struct, $storable, $sereal; ${ $_->[1] } = "Hello", say $_->[0] for $struct, $storable, $sereal; __END__ Can't handle LVALUE data at C:/Programs/Strawberry/perl/vendor/lib/Dat +a/Dump.pm line 374. ( ["Hi perlmonks", '#LVALUE#'], ["Hi perlmonks", \undef], ["Hi perlmonks", \"Hi"], ) Hello perlmonks Hi perlmonks Hi perlmonks
    So IMHO both Sereal and Storable are incorrect, because they should at least warn about LVALUEs not being handled correctly (like Data::Dump does). In most cases I expect that Sereal is the next best thing though?

      ... because they should at least warn about LVALUEs not being handled correctly ...

      Now hold on for a second:

      Warn? Yes a warning would have been nice. Not handled correctly? No.

      Documentation of Storage promises that Storage will work for SCALAR, ARRAY, HASH or REF objects:

      ... persistence to your Perl data structures containing SCALAR, ARRAY, HASH or REF objects, i.e. anything that can be conveniently stored to disk and retrieved at a later time.

      There is no promise that Storage will handle lvalue's and it doesn't, so one may actually reason it is functioning correctly.

      The example script below first shows the effect of storing the lvalue in an array and changing it to undef outside of the array. After that it shows that the same behavior cannot be observed after using the Storage. The lvalue is invalidated and becomes undef. Same behavior can be observed when the lvalue is inside a hash, also the correct behavior if you ask me. I do believe a warning would have been nice though:

      use strict ; use warnings ; use Storable qw( freeze thaw ) ; use Data::Dumper ; my $abc = "abc" ; my @ar = ( \( substr( $abc, 0 ) ) ) ; print "ar_0 = ${$ar[0]}\n" ; my $def = "def" ; my @ar2 = ( \( substr( $def, 0 ) ) ) ; $def = undef ; print "ar2_0 = ${$ar2[0]}\n" ; $def = 'def' ; print "ar2_0 = ${$ar2[0]}\n" ; my $serialized1 = freeze \$abc ; my $serialized2 = freeze \substr $def, 0 ; $abc = 'ghi' ; $def = 'jkl' ; $abc = ${ thaw( $serialized1 ) } ; $def = ${ thaw( $serialized2 ) } ; print "thaw abc = $abc\n" ; print "thaw def = $def\n" ; my $xyz = "xyz" ; my %xyz = ( _xyz => \substr $xyz, 0 ) ; my $ser_xyz = freeze \%xyz ; $xyz = "uvw" ; my %copyxyz = %{ thaw( $ser_xyz ) } ; print Dumper( \%copyxyz ) ; __END__ ar_0 = abc Use of uninitialized value in concatenation (.) or string at test2.pl +line 13. ar2_0 = ar2_0 = def thaw abc = abc Use of uninitialized value $def in concatenation (.) or string at test +2.pl line 25. thaw def = $VAR1 = { '_xyz' => \undef };

      Thank you everyone for valuable answers. Eily, maybe you'll find this funny, too:

      use strict; use warnings; use feature 'say'; use Devel::Peek; use Storable qw/ freeze thaw /; my $s = 'abc'; my $r = \substr $s, 1, 1; say ref $r; # LVALUE Dump $r; # say ${ thaw freeze $r }; # failure $$r = 'Z'; say ref $r; # LVALUE again Dump $r; # ref target is now POK, PV is "Z" say ${ thaw freeze $r }; # "Z"

      After first direct use for the purpose it was designed for in general and created in this very test, the substr's LVALUE magic is still there, and yet referent's POK flag is set, and Storable is fooled to DWIM.

      ref:
      The return value LVALUE indicates a reference to an lvalue that is not a variable. You get this from taking the reference of function calls like pos or substr.

      Why, here you are:

      use strict; use warnings; use feature 'say', 'state'; use Devel::Peek; use Storable qw/ freeze thaw /; sub foo : lvalue { state $r; $$r } foo = 42; say ref \foo; # SCALAR Dump \foo; # Nothing interesting. Much # shorter output than one # full of magic, above. say ${ thaw freeze \foo }; # 42

      Wait, but it was exactly "reference to an lvalue that is not a variable"! Hm-m, though, they didn't say the reverse is true... So the LVALUE that confuses Storable, is limited to references to substr and pos (?).

      ----

      Storable, indeed, promises only to work with "SCALAR, ARRAY, HASH or REF objects". But it successfully deep-clones references to e.g. variables long out of scope. From there, maybe it's not too long distance to clone a string to which \substr refers.

      As much as the above would be interesting (if it worked), I only wanted, in OP, to pass values, not LVALUE magic. For that, Sereal DWIMs. Storable doesn't.

      But I really should have done as in first line:

      @result = mce_map { ...work with $_ } substr(...) @result = mce_map { ...work with $$_ } \substr(...) @result = mce_map { ...work with $$_ } \( my $s = substr(...))

      (2nd parameter is "array of", of course) and not in any of 2 next lines. 2nd only works with Sereal. Depending on strings length and number, benchmark shows either of the 3 can be up to 30% faster than others, with mce_map block being no-op. Which is irrelevant, this gain is tiny compared to time required for real job. Consumed memory was also almost the same for all 3, as duplicates were created anyway (by Sereal?) in 2nd case.

      ----

      BrowserUk, the route you suggest makes perfect sense, if parallelism was fine-tuned by hand, especially since ultimate source of all strings (from which substrings were further extracted) is single large file with kind of TOC (long story...). But mce_map is so amazingly convenient to transparently add parallelism here, all long time consuming work happens per substring inside its block.

Re: Subtle(?) issue(?) with lvalues and serialization
by BrowserUk (Patriarch) on May 24, 2018 at 19:54 UTC

    Rather than passing an array of lvalue refs, I'd pass a string of numeric pairs that characterise those substrings; and then create the substrings (or references to them), on the other side.

    Ie. Pass $substrs = '10,4 15,3 20,32 ...'; and on the other side unpack them and create/process your substrings:

    doWhatever( substr( $bigString, $_->[0], $_->[1] ) ) for map[ split ', +', $_ ], split ' ', $substrs;

    It may seem crude, but a packed string takes far less space than an array containing the same numbers and unless your big string is greater than 64k (ie offsets greater than 16-bits) then packing to variable length ascii saves space over packing to fixed length binary with pack.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Subtle(?) issue(?) with lvalues and serialization
by Veltro (Hermit) on May 24, 2018 at 15:06 UTC

    The problem is the way how you use substr

    substr can be used as lvalue

    Just as an example, this works:

    say 'Storable: ', ${ thaw freeze \(my $tst = substr 'abc', 0) };