Re^2: Convert XML To Perl Data Structures Using XML::Twig

mirod,
I love closures. See How A Function Becomes Higher Order, and Understanding And Using Iterators for examples ;-)

I believe I did a poor job of explaining my goal and my hangup. I am processing a log with millions of XML messages. Each message must be converted to a distinct perl data structure. While I can see several ways of accomplishing this, none of them seem to let me have my cake and eat it too.

To use a closure in the way you describe, I would need a factory to create a brand new closure for each message and either instantiate a new instance of XML::Twig for each message or call $twig->setTwigHandlers() in between each call to $twig->parse(). The alternative would be to leave the XML::Twig object alone and perform a deep copy and "reset" of reference that was closed over in between each message.

My comprimise - which I am fine with, was to write my own dispatch table where I could do something akin to:

# ...
my $twig = XML::Twig->new();
while (<$fh>) {
    chomp;
    my $msg = {};
    $twig->parse($_);
    for my $child ($twig->root->children) {
        my $handler = $child->tag;
        if ($dispatch{$handler}) {
            $dispatch{$handler}->($child, $msg);
        }
        else {
            die "Haven't written handler for '$handler' yet";
        }
    }
    # do something with $msg
}
[download]

I asked for advice here to make sure I wasn't missing anything obvious. I will certainly check out simplify.

Cheers - L~R

Comment on Re^2: Convert XML To Perl Data Structures Using XML::Twig Select or Download Code

Replies are listed 'Best First'.
Re^3: Convert XML To Perl Data Structures Using XML::Twig by Jenda (Abbot) on May 25, 2011 at 15:18 UTC
Show me a few example messages and the desired datastructure and let's see how it goes with XML::Rules ... and whether you like the resulting code. This really looks like a perfect task for that module. Jenda Enoch was right! Enjoy the last years of Rome.	[reply]
Re^3: Convert XML To Perl Data Structures Using XML::Twig by mirod (Canon) on May 25, 2011 at 14:09 UTC
My bad, I did not notice who had asked the original question. If I had paid attention I would have assumed you knew what a closure was! I still don't understand very well what your problem is though. Is it that each message is a different XML "document"? In this case the ever helpful FAQ has something to say about it: Q22: I need to process XML documents. The problem is that they are several of them, so the parser dies after the first one, with a message telling me that there is junk after the end of the document. Is there any way I could trick the parser into believing they are all part of a single document?. If that's not the problem, then either post an example of the data, an an example of what it is you do with the data you generate for each message... or live happily ever after with the solution you have ;--)	[reply]
Re^4: Convert XML To Perl Data Structures Using XML::Twig by Limbic~Region (Chancellor) on May 25, 2011 at 14:43 UTC
mirod, I can't share the actual data (work) but I think the following might make things a little more clear. If not, then I will live happily with the solution that I am currently constructing. Mock up of the log file that I am working with: `2011-04-28 13:25:47 INFO [main:114] <Message><Tag attribute="value">An +swer</Tag></Message> 2011-04-28 13:45:12 DEBUG [Populate::List:31] <Message><Tag attribute= +"value">Answer</Tag></Message>` [download] In other words, a Log4J standard log where the log entry is an XML document. I am parsing the log similar to the code below: `while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; }` [download] For each XML document, I need to convert it to a perl data structure and do something with it. That would look something like: `my $twig = XML::Twig->new(); while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my %data_structure; $twig->parse($xml); # Build up %data_structure using $twig }` [download] I could easily change this code to be "elegant" as such: `while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my $data_structure = extract_data($xml); } sub extract_data { my ($xml) = @_; my $data = {}; my $twig = XML::Twig->new( twig_handlers => { Message => sub { handle_message(@_, $data) } } ); $twig->parse($xml); return $data; } sub handle_message { # ... }` [download] There is absolutely nothing wrong with this and I haven't profiled it to see that it isn't fast enough but that is my concern. I would like to inline as much as possible. So now that I have laid it out there I realize if it were someone else asking this question I would tell them to quit being falsely lazy, write it in a clear maintainable way and profile it and only worry about performance if it was unacceptable. Cheers - L~R	[reply] [d/l] [select]