http://qs1969.pair.com?node_id=235730

A while ago I started on a data dumper that wasn't designed to be eval-able (like Data::Dumper) but human-readable and - more importantly - accurate.

I've just more-or-less finished the first version which fairly decently meets these goals, by doing a breadth-first walk and making use of B to analyze the data.

An example of what Data::XDumper does:

use Data::XDumper qw(Dump); sub Test { unshift @_, { y => \@_ }; \@_ } my $x = "foo"; my $data = Test($x, substr($x, 1, 2)); bless $data, 'Quux'; bless \$data, 'Bar'; Dump $data, $data;
with default settings produces the output:
$L001:  Bar \
@L003:     Quux @(
              {y => \@L003},
$L002:        'foo',
              substr($L002, 1, 2)
           )
        $L001
Here are direct links to the sources: Data::XDumper 1.03 and its prerequisite B::More 1.01. You can also browse them at CPAN.

•Update: found a major memory leak, fixed Feb 20 13:11:50 CET 2003 in 1.03

And a syntax-highlighted version for online reading (also updated)

I hope I can get some feedback on my approach and layout, and I need to know how robust it is. I've tried all kinds of input but the possibilities are endless, so perhaps other people can find things it breaks on.

•Update: what I mean with the above paragraph is: Could you please run the most disgusting piece of data you can think of through XDumper and report the results? :-)

Replies are listed 'Best First'.
•Re: Data::XDumper
by merlyn (Sage) on Feb 16, 2003 at 16:21 UTC
      I'm not trying to reinvent YAML in any way. I'm not interesting in serializing data structure, nor reading them in in different languages.

      I want an accurate dump of perl data structures. Neither Dumper nor Denter offer this, and I doubt YAML does either (although I haven't checked really - feel free to correct me)

      an example which shows where Dumper and Denter go wrong:
      http://tnx.nl/scribble/420FRAP
      how would YAML handle that?

      •Update: the above link died.. so here it is, expanded to include other dumping formats mentioned also:

        So, if the problem is that Data::Dumper dumps some structures wrong, then submit a patch to Data::Dumper, rather than inventing something different that a lot of people won't know about and leaving the buggy code in a core distribution.

        As for your example:

        $x = 4; $xx = sub{\@_}->(\$x, \$x); $y = \4; $yy = sub{\@_}->($y, $y); bless \$xx->[0], 'Foo'; bless \$xx->[1], 'Bar'; bless \$yy->[0], 'Foo'; bless \$yy->[1], 'Bar'; use Data::Dumper; print Dumper $xx, $yy;
        I suspect you haven't seen the Purity flag, which needs to be set in some more complex data, such as yours.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

        To answer my own question, YAML::Dump handles it very poorly.

        Like Dumper and Denter, it doesn't see the difference between $xx and $yy. It also appears to ignore blessings, which I find rather difficult to believe.

        Plus if I feed it a bit more complex structure (the example I included with XDumper), YAML gives me: Can't create YAML::Node from 'GLOB' at /Library/Perl/YAML/Transfer.pm line 41

        So I don't think YAML can be compared to XDumper. They simply have different goals and different advantages.

Re: Data::XDumper
by demerphq (Chancellor) on Feb 17, 2003 at 04:48 UTC

    Could you please run the most disgusting piece of data you can think of through XDumper,

    Please examine the test cases in Data::BFDump and Data::Dumper. If you wish even tougher test casesd then contact me directly and I'll give you some doozies. broquaint was also helpful, ask him directly. :-)

    A comment on this whole enterprise: Dumping perl data structures is a lot harder a problem than it seems on face value. Don't let that put you off though.

    Good luck. :-)

    ---
    demerphq


Re: Data::XDumper
by demerphq (Chancellor) on Feb 17, 2003 at 15:50 UTC

    Here a set of of test cases that I developer for a serialization comparison test suite that I wrote (unpublished). To understand it and convert it to your own uses all you need to remember is that serializer_ok takes the following parameters:
    serializer_ok($dumper_object, $object_to_dump, $test_name) $dumper_obj is a wrapper around various dumper implementations. $object_to_dump is a single value to dump (multiples can be wrapped in an array). Incidentally this routine does a low level comparison of the results of the serialization, as well as a twice in a row test. (Ie is $a->dumped->evaled->dumped == $a)
    and that Data::Tools->capture() is functionally equivelent to the following sub

    sub capture { \@_ }

    Incidentally I welcome correspondance about your dumper and testing it. When I get a chance to review your code I will add it to my serialization tests (provided that this is possible, as I said I havent reviewed XDumper yet.)

    HTH

    ---
    demerphq


      Thanks.. it has already helped me notice that when I recently "fixed" formatting of code refs, I actually broke them :-)

      I've included the output of my latest version (not yet uploaded)..

      I'll reupload as soon as I've fixed code refs.
      •Update: I think I fixed them. I updated the above dump too

        I'm very impressed. :-) I don't like the notation but I am very impressed indeed. A couple of niggles though. It looks like the results of the "Dog Kennel" tests are inconsistent. Id also like to see the results of

        { my ($x,$y,$z); my $to_dump=capture(capture($x,$y,$z),$x,$y,$z); }

        Ill be checking this module out very soon indeed. ++ to you.

        ---
        demerphq


XDumper Grammar
by xmath (Hermit) on Feb 17, 2003 at 23:37 UTC
    Here is a brief pseudo-grammar for XDumper's output. Note that whitespace is irrelevant and only used for clarity.
    item -> var | special | label var -> ( label ':' )? classname? '<ro>'? (scalar | array | hash | glob | code | io | lvalue ) scalar -> number | string | ref | 'undef' ref -> '<weak>'? ( '\' item | '[' list? ']' | '{' hlist? '}' ) array -> '@(' list? ')' hash -> '%(' hlist? ')' list -> item (',' item)* hlist -> key '=>' item (',' key '=>' item)* glob -> '<anon>'? '*' package? name code -> '<format>'? '&(' filename ':' linenum ')' io -> '<io>' lvalue -> substr | pos | vec substr -> 'substr(' item ',' number ',' number ')' pos -> 'pos(' item ')' vec -> 'vec(' item ',' number ',' number ')' special -> '<undef>' | '<yes>' | '<no>'

    In particular note that [...] is an abbreviation for \@(...) and {...} is an abbreviation for \%(...).

    The abbeviated forms are used whenever possible, but sometimes it has to use the expanded form, for example when the array or hash has a prefix (blessing, read-only) or when it has a label.

    For example $L001: <ro> \Foo @(1, 2) means that label $L001 names a read-only reference to a Foo-blessed array containing the numbers 1 and 2.

    For people who know the perl guts: 'var', 'ref', 'array', 'hash' etc in the above grammar correspond one-to-one with SV, RV, AV, HV etc.

    •Update: added format and io
    •Update: added lvalues (completely forgot them earlier)