OK, so as a follow-up, here is the simplest test I found that triggers the segfault:

#!/usr/bin/perl use strict; use warnings; use threads; use XML::Twig; foreach my $i (1..2) { warn "creating thread $i\n"; my $thread = threads->new(\&create_twig); $thread->join; sleep 1; } sub create_twig { warn " creating twig\n"; my $twig = XML::Twig->new( protocol_encoding=>"x-sjis-unicode") ->safe_parse( '<doc/>'); }

The problem happens only when I add the protocol_encoding=>"x-sjis-unicode" option.

At this point it is worth checking whether the problem lies with XML::Twig, or with the underlying module, XML::Parser:

#!/usr/bin/perl use strict; use warnings; use threads; use XML::Parser; foreach my $i (1..3) { warn "creating thread $i\n"; my $thread = threads->new(\&create_parser); $thread->join; sleep 1; } sub create_parser { warn " creating parser\n"; my $parser = XML::Parser->new( ProtocolEncoding=>"x-sjis-unicode") +; $parser->parse( '<doc/>'); }

This code crashes too!

So on one hand, the problem is not in XML::Twig, so I am sort of off the hook ;--). On the other hand, this doesn't help you much :--(

Is there any way you could pre-process your data to make it UTF-8? That would simplify processing, and make it cleaner. In any case, as it is, having to use the protocol_encoding option is quite shady. You don't even have to keep the "fixed" data around, you can just open a pipe that would change the encoding (iconv is great for that) and the encoding declaration, process it, then convert the data back (if needed) when you output it. Does it make sense?


In reply to Re: Script crashes when parsing XML by mirod
in thread Script crashes when parsing XML by jabarin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.