Yes, I am aware of XML::Twig, but it is not suitable to my needs (or at leat I did not see how I could use it, because I need to "patch" an already parsed element to adjust its value during the parsing ans split of a big block of elements that I prefer not to keep in memory)

As you mention yourself in your results, the different SAX parsers are not consistent in regard to the SAX events, at least for XML::SAX::Expat that includes the encoding into start_document() data instead of xml_decl() data or XML::SAX::PurePerl that does not notify xml_decl() at all

Also I do not get the same results as you with my test program and data. Could you check for what file XML::LibXML::SAX manages to give you an encoding? You can see it does not with my utf-8 sample.

data.xml

<?xml version="1.0" encoding="UTF-8" ?> <root> <foo> <bar attr="baz">héhé mes 2 €</bar> <baz other="dummy"/> </foo> </root>

test_sax.xml

use strict; use warnings; use feature 'say'; #~ use Say; #portability trick for 5.8.8 use XML::SAX::ParserFactory; use XML::SAX::Writer; my $input = $ARGV[0] or die "usage: $0 <file.xml> [parser_package]"; $XML::SAX::ParserPackage = $ARGV[1] if $ARGV[1]; my $output; #just for not outputting to STDOUT my $writer = new XML::SAX::Writer(Output => \$output); my $handler = new SaxHandler( Handler => $writer ); my $parser = XML::SAX::ParserFactory->parser( Handler => $handler ); say sprintf "parser is %s (%s)", ref $parser, $parser->VERSION ; $parser->parse_file($input); { package SaxHandler; use base 'XML::SAX::Base'; use Data::Printer {indent=>2}; use feature 'say'; #~ use Say; #portability trick for 5.8.8 sub xml_decl { my ($self, $decl) = @_; say "decl ", np $decl; $self->SUPER::xml_decl($decl); } sub start_document { my ($self, $doc) = @_; say "document ", np $doc; $self->SUPER::start_document($doc); } sub start_element { my ($self, $el) = @_; #~ say "start element " . $el->{LocalName}; $self->SUPER::start_element($el); } }

my results:

macbookseb:perl seb$ perl -v This is perl 5, version 22, subversion 1 (v5.22.1) built for darwin-th +read-multi-2level[...] macbookseb:perl seb$ perl test_sax.pl data.xml XML::SAX::PurePerl parser is XML::SAX::PurePerl (0.99) document \ {} macbookseb:perl seb$ perl test_sax.pl data.xml XML::SAX::Expat parser is XML::SAX::Expat (0.51) document \ { Encoding "UTF-8", Standalone "", Version 1.0 } macbookseb:perl seb$ perl test_sax.pl data.xml XML::LibXML::SAX parser is XML::LibXML::SAX (2.0124) document \ {} decl \ { Version 1.0 } macbookseb:perl seb$ perl test_sax.pl data.xml XML::LibXML::SAX::Parse +r parser is XML::LibXML::SAX::Parser (2.0124) document \ {} decl \ { Encoding "UTF-8", Version 1.0 }

In reply to Re^2: XML::SAX::ParserFactory policy and differences between parser implementations by seki
in thread XML::SAX::ParserFactory policy and differences between parser implementations by seki

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.