in reply to Re: Convert XML To Perl Data Structures Using XML::Twig
in thread Convert XML To Perl Data Structures Using XML::Twig

mirod,
I love closures. See How A Function Becomes Higher Order, and Understanding And Using Iterators for examples ;-)

I believe I did a poor job of explaining my goal and my hangup. I am processing a log with millions of XML messages. Each message must be converted to a distinct perl data structure. While I can see several ways of accomplishing this, none of them seem to let me have my cake and eat it too.

To use a closure in the way you describe, I would need a factory to create a brand new closure for each message and either instantiate a new instance of XML::Twig for each message or call $twig->setTwigHandlers() in between each call to $twig->parse(). The alternative would be to leave the XML::Twig object alone and perform a deep copy and "reset" of reference that was closed over in between each message.

My comprimise - which I am fine with, was to write my own dispatch table where I could do something akin to:

# ... my $twig = XML::Twig->new(); while (<$fh>) { chomp; my $msg = {}; $twig->parse($_); for my $child ($twig->root->children) { my $handler = $child->tag; if ($dispatch{$handler}) { $dispatch{$handler}->($child, $msg); } else { die "Haven't written handler for '$handler' yet"; } } # do something with $msg }

I asked for advice here to make sure I wasn't missing anything obvious. I will certainly check out simplify.

Cheers - L~R

Replies are listed 'Best First'.
Re^3: Convert XML To Perl Data Structures Using XML::Twig
by Jenda (Abbot) on May 25, 2011 at 15:18 UTC

    Show me a few example messages and the desired datastructure and let's see how it goes with XML::Rules ... and whether you like the resulting code. This really looks like a perfect task for that module.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re^3: Convert XML To Perl Data Structures Using XML::Twig
by mirod (Canon) on May 25, 2011 at 14:09 UTC
      mirod,
      I can't share the actual data (work) but I think the following might make things a little more clear. If not, then I will live happily with the solution that I am currently constructing.

      Mock up of the log file that I am working with:

      2011-04-28 13:25:47 INFO [main:114] <Message><Tag attribute="value">An +swer</Tag></Message> 2011-04-28 13:45:12 DEBUG [Populate::List:31] <Message><Tag attribute= +"value">Answer</Tag></Message>

      In other words, a Log4J standard log where the log entry is an XML document. I am parsing the log similar to the code below:

      while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; }

      For each XML document, I need to convert it to a perl data structure and do something with it. That would look something like:

      my $twig = XML::Twig->new(); while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my %data_structure; $twig->parse($xml); # Build up %data_structure using $twig }

      I could easily change this code to be "elegant" as such:

      while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my $data_structure = extract_data($xml); } sub extract_data { my ($xml) = @_; my $data = {}; my $twig = XML::Twig->new( twig_handlers => { Message => sub { handle_message(@_, $data) } } ); $twig->parse($xml); return $data; } sub handle_message { # ... }

      There is absolutely nothing wrong with this and I haven't profiled it to see that it isn't fast enough but that is my concern. I would like to inline as much as possible. So now that I have laid it out there I realize if it were someone else asking this question I would tell them to quit being falsely lazy, write it in a clear maintainable way and profile it and only worry about performance if it was unacceptable.

      Cheers - L~R