nabeel has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I ran into a problem with Data::Dumper segfaulting with high bit characters. Only the xs version is seg faulting, the pure perl version shows warnings, but doesn't break completely. Here is a bit of code to reproduce the problem
#!/usr/bin/perl -w use Encode; use utf8; use Data::Dumper; #$Data::Dumper::Useperl = 1; my $string = 'ä, ö, ü / Ä, Ö, Ü, ß ¿Adónde vas?'; $string = encode_utf8( $string ) ; my $print_count = 1; my $hash = { key => $string }; my $tmp = Dumper $hash; print "1:$tmp\n"; # Turning $Data::Dumper::Userperl to 0 causes a segfault in th +e second dumper; $hash = eval $tmp; print "Value is : $hash->{key}\n"; $tmp = Dumper $hash; print "2:$tmp\n"; $hash = eval $tmp; print "Value is : $hash->{key}\n"; $tmp = Dumper $hash; print "2:$tmp\n";
I am using perl 5.8.4 on Debian Sarge. Any ideas on what I can do to fix this would be great as I am completely flummoxed. Thanks Nabeel

Replies are listed 'Best First'.
Re: Data::Dumper segfaulting.
by dave_the_m (Monsignor) on Apr 16, 2007 at 11:28 UTC
    While clearly Data::Dumper shouldn't segfault, the problem is triggered by your use of 'use utf8'; this is in scope during the evals but what you're evalling isn't valid utf8.

    Dave.

      Ok, as Dave suggested, I tried taking 'use utf8' out, but that gives me some really wacky output :
      1:$VAR1 = { 'key' => 'ä, ö, ü / à , Ã, Ã, à ¿Adónde vas?' }; Value is : ä, ö, ü / à , Ã, Ã, à ¿Adónde vas? 2:$VAR1 = { 'key' => 'ä, ö, ü / à , Ã, Ã, à ¿Adónde vas?' }; Value is : ä, ö, ü / à , Ã, Ã, à ¿Adónde vas? 2:$VAR1 = { 'key' => 'ä, ö, ü / à , Ã, Ã, à ¿Adónde vas?' };
      Which I am guessing is just the byte values being shown. Good news is the it doesn't seg, but the characters are not what I want. What would be great if there was a way to take an arbitrarily complex hash structure, which can contain high bit characters, write the dumper of that out to file ( or just get the dumper back in a variable) , eval it back in without any change in information ... seems to be harder than it sounds. But it is certainly making for interesting times :-).
        Well, although its difficult to tell what you expect (since the HTML display of perlmonks may be adding another layer of uft8/bytes display confusion, it looks to me from the above that Data::Dumper is faithfully reproducing what it's being given, but what it's being given isn't what you expect, due to the spurious encode_utf8() that isn't required.

        Dave.

Re: Data::Dumper segfaulting.
by GrandFather (Saint) on Apr 16, 2007 at 10:00 UTC

    A work around may be to use Data::Dump::Streamer instead:

    #!/usr/bin/perl -w use Encode; use utf8; use Data::Dump::Streamer; #$Data::Dumper::Useperl = 1; my $string = 'ä, ö, ü / Ä, Ö, Ü, ß ¿Adónde vas?'; $string = encode_utf8( $string ) ; my $print_count = 1; my $hash = { key => $string }; my $tmp = Dump ($hash)->Out (); print "1:$tmp\n"; # Turning $Data::Dumper::Userperl to 0 causes a segfault in the second + dumper; $hash = eval $tmp; print "Value is : $hash->{key}\n"; $tmp = Dump ($hash)->Out (); print "2:$tmp\n"; $hash = eval $tmp; print "Value is : $hash->{key}\n"; $tmp = Dump ($hash)->Out(); print "2:$tmp\n";

    Prints:

    Subroutine B::SV::object_2svref redefined at C:/Perl/lib/DynaLoader.pm + line 253. 1:$HASH1 = { key => "\344, \366, \374 / \304, \326, \334, \337 \277Ad\ +363nde vas?" }; Value is : ä, ö, ü / Ä, Ö, Ü, ß ¿Adónde vas? 2:$HASH1 = { key => "\344, \366, \374 / \304, \326, \334, \337 \277Ad\ +363nde vas?" }; Value is : ä, ö, ü / Ä, Ö, Ü, ß ¿Adónde vas? 2:$HASH1 = { key => "\344, \366, \374 / \304, \326, \334, \337 \277Ad\ +363nde vas?" };

    DWIM is Perl's answer to Gödel
      Thanks for the quick reply , I just had a look at Data::Dumper::Streamer. The only thing is, I need to use Data::Dumper ( or something like it) in a core part of a large system which is used VERY often, so I liked the fact that the xs version of dumper was fast (well faster than the perl version ). I haven't done any benchmarking on Data::Dumper::Streamer but the documentation I think said that it sacrificed speed for memory whereas I am happy to go the other way :-). But still might be worth a try as it seems to be dealing with the high bit characters well. Nabeel