in reply to Re: XML::SAX::ParserFactory policy and differences between parser implementations
in thread XML::SAX::ParserFactory policy and differences between parser implementations
Yes, I am aware of XML::Twig, but it is not suitable to my needs (or at leat I did not see how I could use it, because I need to "patch" an already parsed element to adjust its value during the parsing ans split of a big block of elements that I prefer not to keep in memory)
As you mention yourself in your results, the different SAX parsers are not consistent in regard to the SAX events, at least for XML::SAX::Expat that includes the encoding into start_document() data instead of xml_decl() data or XML::SAX::PurePerl that does not notify xml_decl() at all
Also I do not get the same results as you with my test program and data. Could you check for what file XML::LibXML::SAX manages to give you an encoding? You can see it does not with my utf-8 sample.
data.xml
<?xml version="1.0" encoding="UTF-8" ?> <root> <foo> <bar attr="baz">héhé mes 2 €</bar> <baz other="dummy"/> </foo> </root>
test_sax.xml
use strict; use warnings; use feature 'say'; #~ use Say; #portability trick for 5.8.8 use XML::SAX::ParserFactory; use XML::SAX::Writer; my $input = $ARGV[0] or die "usage: $0 <file.xml> [parser_package]"; $XML::SAX::ParserPackage = $ARGV[1] if $ARGV[1]; my $output; #just for not outputting to STDOUT my $writer = new XML::SAX::Writer(Output => \$output); my $handler = new SaxHandler( Handler => $writer ); my $parser = XML::SAX::ParserFactory->parser( Handler => $handler ); say sprintf "parser is %s (%s)", ref $parser, $parser->VERSION ; $parser->parse_file($input); { package SaxHandler; use base 'XML::SAX::Base'; use Data::Printer {indent=>2}; use feature 'say'; #~ use Say; #portability trick for 5.8.8 sub xml_decl { my ($self, $decl) = @_; say "decl ", np $decl; $self->SUPER::xml_decl($decl); } sub start_document { my ($self, $doc) = @_; say "document ", np $doc; $self->SUPER::start_document($doc); } sub start_element { my ($self, $el) = @_; #~ say "start element " . $el->{LocalName}; $self->SUPER::start_element($el); } }
my results:
macbookseb:perl seb$ perl -v This is perl 5, version 22, subversion 1 (v5.22.1) built for darwin-th +read-multi-2level[...] macbookseb:perl seb$ perl test_sax.pl data.xml XML::SAX::PurePerl parser is XML::SAX::PurePerl (0.99) document \ {} macbookseb:perl seb$ perl test_sax.pl data.xml XML::SAX::Expat parser is XML::SAX::Expat (0.51) document \ { Encoding "UTF-8", Standalone "", Version 1.0 } macbookseb:perl seb$ perl test_sax.pl data.xml XML::LibXML::SAX parser is XML::LibXML::SAX (2.0124) document \ {} decl \ { Version 1.0 } macbookseb:perl seb$ perl test_sax.pl data.xml XML::LibXML::SAX::Parse +r parser is XML::LibXML::SAX::Parser (2.0124) document \ {} decl \ { Encoding "UTF-8", Version 1.0 }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: XML::SAX::ParserFactory policy and differences between parser implementations
by beech (Parson) on Mar 02, 2016 at 02:38 UTC | |
by seki (Monk) on Mar 02, 2016 at 02:57 UTC |