I'm working on a script that will pull rrd xport data from a webserver. I'm using LWP::UserAgent as well as XML::LibXML::SAX to retrieve and parse the incoming data. I am using a LWP::UserAgent callback to pass the XML to the SAX parser. Some of the xports can be larger than 100megs, and it would be a waste of space to store all of the XML. The XML parser builds a data structure out of the XML data and returns it to main.

What I am having issues with is the best way to pass variables and objects around to these different handlers. Whenever I write a sub, I keep it self contained, all vars and objects worked on by that sub are explicitly passed to it and returned.

The first issues is passing the parser object to the LWP::UserAgent callback. The second issue is storing the data structure build by the SAX parser.

For the LWP::Useragent callback, I don't see a way to pass additional vars. SAX is so new to me that I'm not sure where to begin.

I'm open to all suggestions and critiques (especially with SAX).

Below is the working code (still pretty ugly) and a working xml example.

<xport> <meta> <start>1020611700</start> <step>300</step> <end>1020615600</end> <rows>14</rows> <columns>2</columns> <legend> <entry>out bytes</entry> <entry>in and out bits</entry> </legend> </meta> <data> <row><t>1020611700</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020612000</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020612300</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020612600</t><v>3.4113333333e+00</v><v>5.4581333333e+01</ +v></row> <row><t>1020612900</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020613200</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020613500</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020613800</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020614100</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020614400</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020614700</t><v>3.7333333333e+00</v><v>5.9733333333e+01</ +v></row> <row><t>1020615000</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020615300</t><v>3.4000000000e+00</v><v>5.4400000000e+01</ +v></row> <row><t>1020615600</t><v>NaN</v><v>NaN</v></row> </data> </xport>
#!/usr/bin/perl -w use strict; use warnings; use XML::SAX; use XML::SAX::ParserFactory; use Data::Dumper; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 1; use LWP::UserAgent; my %data; $data{values} = (); my $factory = XML::SAX::ParserFactory->new; $XML::SAX::ParserPackage = "XML::LibXML::SAX::Better"; $factory->require_feature('http://xml.org/sax/features/namespaces'); # now we do the way we want, sending chunks: my $streamed_events; my $handler = EventRecorder->new(\$streamed_events); my $p = $factory->parser(Handler => $handler); my $epoch = time; my $url = 'http://localhost/rrd_compare.xml'; my $xml = httpgetxml($url); sub httpgetxml { my $url = shift; my $ua = LWP::UserAgent->new; my $request = HTTP::Request->new(GET => $url); my $stuff = $ua->request($request, \&parseXenXMLchunk); } sub parseXenXMLchunk{ my ($data, $res, $req) = @_; $p->parse_chunk($data); return 1; } print Dumper \%data; package EventRecorder; use strict; use base qw(XML::SAX::Base); sub new { my ($class, $outref) = @_; $$outref = ""; return bless { outref => $outref, }; } sub start_element { my ($self, $data) = @_; } sub characters { my $self = shift; my $text = shift; $self->{text} .= $text->{Data}; } sub end_element{ my $self = shift; my $data = shift; my $text = $self->get_text(); # To be cleaned up later $text =~ s/\n//g; $text =~ s/^\s+//; $text =~ s/\s+$//; $text =~ s/\s+/ /; my $local_name = $data->{LocalName}; if ($local_name eq "step"){ $data{$local_name} = $text; } elsif ($local_name eq "entry"){ push @{$data{datasource}}, $text; } elsif ($local_name eq "t"){ $data{lasttime} = $text; } elsif ($local_name eq "v"){ push @{$data{values}{$data{lasttime}}}, $text; } } sub get_text { my $self = shift; my $text = ''; if ( defined( $self->{text} ) ) { $text = $self->{text}; $self->{text} = ''; } return $text; } # XML::LibXML::SAX::Better an extended SAX handler by Djabberd project package XML::LibXML::SAX::Better; use strict; use vars qw($VERSION @ISA); $VERSION = '1.00'; use XML::LibXML; use XML::SAX::Base; use base qw(XML::SAX::Base); sub new { my ($class, @params) = @_; my $inst = $class->SUPER::new(@params); my $libxml = XML::LibXML->new; $libxml->set_handler( $inst ); $inst->{LibParser} = $libxml; # setup SAX. 1 means "with SAX" $libxml->_start_push(1); $libxml->init_push; return $inst; } sub parse_chunk { my ( $self, $chunk ) = @_; my $libxml = $self->{LibParser}; my $rv = $libxml->push($chunk); } sub finish_push { my $self = shift; return 1 unless $self->{LibParser}; my $parser = delete $self->{LibParser}; return eval { $parser->finish_push }; } 1;
Thanks

In reply to Handler semantics by msalerno

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.