seki has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to parse some big xml files while not eating all the user memory, so XML::SAX::Parser seems to be the solution.
My files may contain different diacritics, so preserving the file utf-8 encoding is needed, but XML::SAX::ParserFactory (code taken from XML::SAX::Parser examples) is giving by default a parser that does not get the encoding from the document declaration.
I then discovered that there is more than one SAX parser on my system with
and by accident while testing all of them, only XML::LibXML::SAX::Parser seems to be able to get the document encoding.#debug : list known parsers my $parsers = XML::SAX->parsers(); say np $parsers;
I wonder why not all parser implementation are able to give all the document properties and how I am supposed to know the differences but with trial and error...
Also, why do I need to use explicitly XML::LibXML::SAX::Parser while the documentation of XML::LibXML only tells about XML::LibXML::SAX that miss the Encoding attribute of the xml declaration?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::SAX::ParserFactory policy and differences between parser implementations
by beech (Parson) on Mar 01, 2016 at 19:43 UTC | |
by seki (Monk) on Mar 02, 2016 at 01:54 UTC | |
by beech (Parson) on Mar 02, 2016 at 02:38 UTC | |
by seki (Monk) on Mar 02, 2016 at 02:57 UTC | |
|
Re: XML::SAX::ParserFactory policy and differences between parser implementations
by choroba (Cardinal) on Mar 03, 2016 at 19:13 UTC |