in reply to XML::Twig too many children?

The problem is not quite so trivial as you describe as the following code demonstrates:

use warnings; use strict; use XML::Twig; my $children = 10000; my $found; my $xmlStr = <<XML; <XML> @{["<lemma>1</lemma>\n" x $children]} </XML> XML my $twig = XML::Twig->new( keep_encoding => 1, twig_handlers => {'lemma' => \&ProcessLemma} ); $twig->parse($xmlStr); print "Expected $children, found $found\n"; sub ProcessLemma { my ($XmlTwig, $XmlLemma) = @_; ++$found; $XmlLemma->purge; return 1; }

Prints:

Expected 10000, found 10000

Perhaps you can use a generated XML test case shown above to reproduce your issue?

True laziness is hard work

Replies are listed 'Best First'.
Re^2: XML::Twig too many children?
by robbv (Initiate) on Feb 21, 2012 at 20:33 UTC

    Your example works fine for me too (except that it won't swallow the <<XML; ... XML construction, so I rewrote it).

    However, it breaks if I have just one <lemma>-tag with many children. I find that the breaking point is at 4696/4697.

    use warnings; use strict; use XML::Twig; my $children = 4697; my $found; my $xmlStr = '<XML><lemma>'.join("\n",@{['<line>1</line>' x $children] +}).'</lemma></XML>'; my $twig = XML::Twig->new( keep_encoding => 1, twig_handlers => {'lemma' => \&ProcessLemma} ); $twig->parse($xmlStr); print "Expected one, found $found\n"; sub ProcessLemma { my ($XmlTwig, $XmlLemma) = @_; ++$found; $XmlLemma->purge; return 1; }

    Btw, I don't know if it matters, but I'm using Win32, ActivePerl 5.14.2.

      5.14.2 (i686-linux-thread-multi), XML::Twig 3.39, XML::Parser 2.41.

      Died of a segfault with a sufficiently large number.

      Stack trace:

      Program received signal SIGSEGV, Segmentation fault. 0x0807632d in Perl_call_sv () (gdb) bt #0 0x0807632d in Perl_call_sv () #1 0x080e959d in Perl_sv_clear () #2 0x080e9c8a in Perl_sv_free2 () #3 0x080d7644 in Perl_hv_free_ent () #4 0x080d8bf3 in S_hfreeentries () #5 0x080db12e in Perl_hv_undef_flags () #6 0x080e97cb in Perl_sv_clear () #7 0x080e9c8a in Perl_sv_free2 () #8 0x080d7644 in Perl_hv_free_ent () #9 0x080d8bf3 in S_hfreeentries () #10 0x080db12e in Perl_hv_undef_flags () #11 0x080e97cb in Perl_sv_clear () #12 0x080e9c8a in Perl_sv_free2 () ... #87303 0x080d7644 in Perl_hv_free_ent () #87304 0x080d8bf3 in S_hfreeentries () #87305 0x080db12e in Perl_hv_undef_flags () #87306 0x080e97cb in Perl_sv_clear () #87307 0x080e9c8a in Perl_sv_free2 () #87308 0x080d7644 in Perl_hv_free_ent () #87309 0x080d8bf3 in S_hfreeentries () #87310 0x080db12e in Perl_hv_undef_flags () #87311 0x080e97cb in Perl_sv_clear () #87312 0x080e9c8a in Perl_sv_free2 () #87313 0x080d7644 in Perl_hv_free_ent () #87314 0x080d8bf3 in S_hfreeentries () #87315 0x080db12e in Perl_hv_undef_flags () #87316 0x080e97cb in Perl_sv_clear () #87317 0x080e9c8a in Perl_sv_free2 () #87318 0x08111ef1 in Perl_leave_scope () #87319 0x081120bc in Perl_pop_scope () #87320 0x0811dd60 in Perl_pp_return () #87321 0x080dd748 in Perl_runops_standard () #87322 0x08076475 in Perl_call_sv () #87323 0xb7ac2148 in endElement () from /home/eric/usr/perlbrew/perls/ +5.14.2t/lib/site_perl/5.14.2/i686-linux-thread-multi/auto/XML/Parser/ +Expat/Expat.so #87324 0xb7a93a55 in ?? () from /usr/lib/../lib/libexpat.so.1 #87325 0xb7a948a1 in ?? () from /usr/lib/../lib/libexpat.so.1 #87326 0xb7a95db1 in ?? () from /usr/lib/../lib/libexpat.so.1 #87327 0xb7a9696a in ?? () from /usr/lib/../lib/libexpat.so.1 #87328 0xb7a8d64c in XML_ParseBuffer () from /usr/lib/../lib/libexpat. +so.1 #87329 0xb7a8eab5 in XML_Parse () from /usr/lib/../lib/libexpat.so.1 #87330 0xb7ab6a78 in XS_XML__Parser__Expat_ParseString () from /home/e +ric/usr/perlbrew/perls/5.14.2t/lib/site_perl/5.14.2/i686-linux-thread +-multi/auto/XML/Parser/Expat/Expat.so #87331 0x080df181 in Perl_pp_entersub () #87332 0x080dd748 in Perl_runops_standard () #87333 0x080770ea in perl_run () #87334 0x0805fe3d in main ()

      First guess, a stack overflow from an endless(?) recursive loop. [Upd: It could be a stack overflow, but it's not from endless recursion. The pattern is clearly broken at the top. ]

      The odd thing is that the loop is in perl's code.

      Same with an older version of Perl: 5.10.1 (i686-linux-thread-multi), XML::Twig 3.39, XML::Parser 2.41.

      I'll install a debug build of Perl and see if I hit an assert.

        The odd thing is that the loop is in perl's code.

        I suspect that isn't really a loop as much as Perl free()ing some moderately deeply nested hashes.

        - tye        

        5.14.2 i686-linux-thread-multi with -Doptimize=-g didn't reveal much more.

        It wouldn't hurt to include this in your bug report, but file it with XML::Parser, not XML::Twig.

      For me, it works up to 20_140 (linux, i686, Perl 5.14.2). For 20_142, it usually dies of SIGSEGV, but sometimes still works.