OK, so as a follow-up, here is the simplest test I found that triggers the segfault:
#!/usr/bin/perl use strict; use warnings; use threads; use XML::Twig; foreach my $i (1..2) { warn "creating thread $i\n"; my $thread = threads->new(\&create_twig); $thread->join; sleep 1; } sub create_twig { warn " creating twig\n"; my $twig = XML::Twig->new( protocol_encoding=>"x-sjis-unicode") ->safe_parse( '<doc/>'); }
The problem happens only when I add the protocol_encoding=>"x-sjis-unicode" option.
At this point it is worth checking whether the problem lies with XML::Twig, or with the underlying module, XML::Parser:
#!/usr/bin/perl use strict; use warnings; use threads; use XML::Parser; foreach my $i (1..3) { warn "creating thread $i\n"; my $thread = threads->new(\&create_parser); $thread->join; sleep 1; } sub create_parser { warn " creating parser\n"; my $parser = XML::Parser->new( ProtocolEncoding=>"x-sjis-unicode") +; $parser->parse( '<doc/>'); }
This code crashes too!
So on one hand, the problem is not in XML::Twig, so I am sort of off the hook ;--). On the other hand, this doesn't help you much :--(
Is there any way you could pre-process your data to make it UTF-8? That would simplify processing, and make it cleaner. In any case, as it is, having to use the protocol_encoding option is quite shady. You don't even have to keep the "fixed" data around, you can just open a pipe that would change the encoding (iconv is great for that) and the encoding declaration, process it, then convert the data back (if needed) when you output it. Does it make sense?
In reply to Re: Script crashes when parsing XML
by mirod
in thread Script crashes when parsing XML
by jabarin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |