mirod,
I can't share the actual data (work) but I think the following might make things a little more clear. If not, then I will live happily with the solution that I am currently constructing.

Mock up of the log file that I am working with:

2011-04-28 13:25:47 INFO [main:114] <Message><Tag attribute="value">An +swer</Tag></Message> 2011-04-28 13:45:12 DEBUG [Populate::List:31] <Message><Tag attribute= +"value">Answer</Tag></Message>

In other words, a Log4J standard log where the log entry is an XML document. I am parsing the log similar to the code below:

while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; }

For each XML document, I need to convert it to a perl data structure and do something with it. That would look something like:

my $twig = XML::Twig->new(); while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my %data_structure; $twig->parse($xml); # Build up %data_structure using $twig }

I could easily change this code to be "elegant" as such:

while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my $data_structure = extract_data($xml); } sub extract_data { my ($xml) = @_; my $data = {}; my $twig = XML::Twig->new( twig_handlers => { Message => sub { handle_message(@_, $data) } } ); $twig->parse($xml); return $data; } sub handle_message { # ... }

There is absolutely nothing wrong with this and I haven't profiled it to see that it isn't fast enough but that is my concern. I would like to inline as much as possible. So now that I have laid it out there I realize if it were someone else asking this question I would tell them to quit being falsely lazy, write it in a clear maintainable way and profile it and only worry about performance if it was unacceptable.

Cheers - L~R


In reply to Re^4: Convert XML To Perl Data Structures Using XML::Twig by Limbic~Region
in thread Convert XML To Perl Data Structures Using XML::Twig by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.