Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
I am writing a program that transforms XML into a perl data structure using XML::Twig. It is fairly straight forward code:
for my $child ($root->children) { my $handler = $child->tag; if ($dispatch{$handler}) { $dispatch{$handler}->($child, $data_struct); } else { die "Haven't written handler for '$handler' yet"; } }
At this point, you might be asking why I am not using XML::Twig's built in handlers. As far as I can tell, you can only define them to be subroutine references and the argument list is fixed by XML::Twig. There are some particularly ugly solutions to this problem I have thought of. For instance, I could modify a global in the handler and then copy of the data structure to a lexical outside of the handler and undef the global. I could also use some currying/closure/symbol table manipulation but none of that seems very elegant.

My question is this: Is there a smarter way to do this than rolling my own dispatch table? I am locked in to use Expat so if you suggest another XML module it would need to be using the Expat library under the table.

Cheers - L~R

Replies are listed 'Best First'.
Re: Convert XML To Perl Data Structures Using XML::Twig
by runrig (Abbot) on May 24, 2011 at 17:36 UTC
    Have you looked at XML-Rules? It also uses expat under the hood, and once you get your head wrapped around it, can be very natural.
Re: Convert XML To Perl Data Structures Using XML::Twig
by anonymized user 468275 (Curate) on May 24, 2011 at 16:07 UTC
    I would inherit XML::Twig into a simple package with only new that receives your data (and/or possibly a "set" method to do that) and a handler method that trivially processes the data from there using Twig's own handlers.

    One world, one people

      Really? That seems like a better alternative than writing your own dispatch table? Even using the built in handlers, you still have to write the subroutines themselves. I guess from an abstraction perspective it is appealing but consider my existing code looks like:
      my $twig = XML::Twig->new(); while (<$fh>) { $twig->parse($_); # ... }
      Will now need to look like:
      while (<$fh>) { my $twig = XML::Twig->new($data_struct); $twig->parse($_); # ... }
      I have no idea the extent of the performance overhead but it is probably less than the alternative:
      my $twig = XML::Twig->new($data_struct); while (<$fh>) { $twig->parse($_); # ... my $copy = deep_copy($data_struct); $data_struct = clear_structure($data_struct); }
      I am not saying it is a bad idea - it just doesn't feel clean to me.

      Cheers - L~R

        Hmm I guess it's just a question of personal style what "clean" is. But I do actually agree with the idea of the dispatch table, it's just that I would put that in the instance variable of my own class, sorry if that wasn't clear in my post. But I would need to see more of the code to have a really good idea what "clean" is relative to it, without going against your style.

        One world, one people

Re: Convert XML To Perl Data Structures Using XML::Twig
by stefbv (Priest) on May 24, 2011 at 18:26 UTC

    Maybe the 'simplify' method would work.

    use Data::Dumper; use XML::Twig; my $file = $ARGV[0] || 'file.xml'; my $twig = XML::Twig->new(); my $config = $twig->parsefile( $file )->simplify( # keyattr => 'key', # group_tags => { columns => 'column', }, ); print Dumper( $config );
Re: Convert XML To Perl Data Structures Using XML::Twig
by mirod (Canon) on May 25, 2011 at 07:19 UTC
      mirod,
      I love closures. See How A Function Becomes Higher Order, and Understanding And Using Iterators for examples ;-)

      I believe I did a poor job of explaining my goal and my hangup. I am processing a log with millions of XML messages. Each message must be converted to a distinct perl data structure. While I can see several ways of accomplishing this, none of them seem to let me have my cake and eat it too.

      To use a closure in the way you describe, I would need a factory to create a brand new closure for each message and either instantiate a new instance of XML::Twig for each message or call $twig->setTwigHandlers() in between each call to $twig->parse(). The alternative would be to leave the XML::Twig object alone and perform a deep copy and "reset" of reference that was closed over in between each message.

      My comprimise - which I am fine with, was to write my own dispatch table where I could do something akin to:

      # ... my $twig = XML::Twig->new(); while (<$fh>) { chomp; my $msg = {}; $twig->parse($_); for my $child ($twig->root->children) { my $handler = $child->tag; if ($dispatch{$handler}) { $dispatch{$handler}->($child, $msg); } else { die "Haven't written handler for '$handler' yet"; } } # do something with $msg }

      I asked for advice here to make sure I wasn't missing anything obvious. I will certainly check out simplify.

      Cheers - L~R

        Show me a few example messages and the desired datastructure and let's see how it goes with XML::Rules ... and whether you like the resulting code. This really looks like a perfect task for that module.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Re: Convert XML To Perl Data Structures Using XML::Twig
by admiral_grinder (Pilgrim) on May 26, 2011 at 13:10 UTC
    Have you used XML::Simple? It does exactly that, take a chunk of XML and turn it into Perl structure.