in reply to XML::SAX::ParserFactory policy and differences between parser implementations
I am trying to parse some big xml files while not eating all the user memory, so XML::SAX::Parser seems to be the solution.
The solution is called XML::Twig, see http://xmltwig.org/tutorial/
update: It seems you're already aware of twig,
anyway, the docs aren't clear what is supposed to be going on, but the information is out there :) use xml_decl handler
#!/usr/bin/perl -- use strict; use warnings; use XML::SAX; use Module::Load qw/ load /; my @files = ( ... ); my $parsers = XML::SAX->parsers(); for my $parser ( @$parsers ){ load( $parser->{Name} ); print "\n$parser->{Name}\n"; for my $file ( @files ){ $parser->{Name}->new( Handler => MySAXHandler->new, )->parse_file( $file ); } } package MySAXHandler; use base qw( XML::SAX::Base ); use Data::Dump qw/ pp /; sub start_document { _pper('doc', @_ ) } sub start_dtd { _pper('dtd', @_ ) } sub xml_decl { _pper('decl', @_ ) } sub _pper { my ($name, $self, $doc) = @_; print " $name ", pp( %$doc ), "\n"; } __END__ XML::SAX::Expat doc ("Version", "1.0", "Encoding", "UTF-8", "Standalone", "") doc ("Version", "1.0", "Encoding", "ISO-8859-1", "Standalone", "") doc () XML::LibXML::SAX::Parser doc () decl ("Version", "1.0", "Encoding", "UTF-8") doc () decl ("Version", "1.0", "Encoding", "ISO-8859-1") doc () decl ("Version", "1.0", "Encoding", undef) XML::LibXML::SAX doc () decl ("Version", "1.0") doc () decl ("Version", "1.0", "Encoding", "ISO-8859-1") doc () decl ("Version", "1.0")
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: XML::SAX::ParserFactory policy and differences between parser implementations
by seki (Monk) on Mar 02, 2016 at 01:54 UTC | |
by beech (Parson) on Mar 02, 2016 at 02:38 UTC | |
by seki (Monk) on Mar 02, 2016 at 02:57 UTC |