Sixtease has asked for the wisdom of the Perl Monks concerning the following question:
I keep getting segfaults when I attempt to parse an XML with Czech diacritic characters encoded in UTF-8. I experienced this no matter what parser package I used, except for PurePerl. The segfault only happens for XML files above 2 KB or so... I'm including the code that causes the fault and a link to the minimal XML file giving me the error (if I delete a line, it runs normally).
This has been happening since July roughly and I've been solving it by using PurePerl which now seems to have a bug in it, so I decided to ask help on this matter first.
My perl and machine are:
v5.8.8 built for x86_64-linux-thread-multi
Gentoo Linux for amd64 on Core2 Duo, Kernel 2.6.19 with Gentoo patches.
#!/usr/bin/perl { package Handler; use strict; use warnings; use encoding 'utf8'; sub new { bless +{} } } use strict; use warnings; use encoding 'utf8'; use XML::SAX::ParserFactory; $XML::SAX::ParserPackage = "XML::LibXML::SAX"; open (my $file, '<:encoding(utf8)', 'train.m.xml'); my $parser = XML::SAX::ParserFactory->parser( "Handler" => Handler->new() ); $parser->parse_file($file);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: XML::SAX UTF-8 segfault
by Khen1950fx (Canon) on Jan 26, 2007 at 12:21 UTC | |
by Sixtease (Friar) on Jan 26, 2007 at 16:16 UTC | |
|
Re: XML::SAX UTF-8 segfault
by Anonymous Monk on Jan 26, 2007 at 07:10 UTC |