comment on

'Hang' is such an ugly word :-) On my Athlon 2600 laptop it took nearly 2 hours to complete, but it did eventually finish.

I'm guessing that on your system XML::Simple is using XML::Parser. Internally, XML::Simple does something like this:

  my $xp = XML::Parser->new(Style => 'Tree');
  my $tree = $xp->parse($document);
[download]

and then proceeds to reduce $tree into something simpler.

If you try that snippet on your document, then I suspect you'll see similarly long processing times. I'm not sure why expat is so slow processing your XML, but it is rather 'unusual' XML.

Your original document had over 100,000 lines. I benchmarked passing shorter versions of the document through different parser modules using XML::Simple. (If you install XML::SAX, you can chose alternative parsers, by assigning a parser module name to $XML::Simple::PREFERRED_PARSER). I would caution against reading anything at all into these results in a general sense but in the case of your specific data they are interesting:

Parser	5000 lines	10000 lines	15000 lines	20000 lines	25000 lines
XML::Parser	8	40	92	169	267
XML::SAX::Expat	8	40	95	173	272
XML::SAX::ExpatXS	4	27	66	121	191
XML::SAX::PurePerl	9	20	29	39	49
XML::LibXML::SAX::Parser	1	1	1	1	1

As you can see, the run times for all the expat based parsers are increasing exponentially with the size of the file. The PurePerl parser is coping amazingly well and its run times are only increasing ~~geometrically~~linearly. The libxml based SAX parser is the clear winner in this race with run times so short I can't see an increase at all.

Just to illustrate how unusual your XML is compared to 'normal' XML, I changed the 20,000 line file so that instead of looking like this...

<data>
SUQzAgAAAAAQaVRUMgAAEQBrZWVwIFlhIEhlYWQgVXAAVFAxAAAUAEhhcmxlbSB0aG
cwBUQUwAABQARnJlZSBBZ2VudC9EZWMuIDA0AFRSSwAABQAzLzMAVEVOAAANAGlUdW
AENPTQAAaABlbmdpVHVuTk9STQAgMDAwMDA0NUEgMDAwMDAwMDMgMDAwMDMwRjAgMD
MDAwMENBNTggMDAwMkNDNkQgMDAwMDgyM0MgMDAwMDgwMEQgMDAwNDAxODIgMDAwND
...
</data>
[download]

it looked like this ...

<data>
<line>SUQzAgAAAAAQaVRUMgAAEQBrZWVwIFlhIEhlYWQgVXAAVFAxAAAUAEhhcmxlbSB0
+aG</line>
<line>cwBUQUwAABQARnJlZSBBZ2VudC9EZWMuIDA0AFRSSwAABQAzLzMAVEVOAAANAGlU
+dW</line>
<line>AENPTQAAaABlbmdpVHVuTk9STQAgMDAwMDA0NUEgMDAwMDAwMDMgMDAwMDMwRjAg
+MD</line>
<line>MDAwMENBNTggMDAwMkNDNkQgMDAwMDgyM0MgMDAwMDgwMEQgMDAwNDAxODIgMDAw
+ND</line>
...
</data>
[download]

You might imagine that by introducing all those extra <line> tags and the implicit extra structure, that the parsers would have more work to do and would therefore be slower. But here are the timings for processing the modified version:

Parser	20000 lines
XML::Parser	2
XML::SAX::Expat	7
XML::SAX::ExpatXS	5
XML::SAX::PurePerl	73
XML::LibXML::SAX::Parser	7

Yes, you read that right, the run time using XML::Parser dropped from 169 seconds to 2! This is a much more 'normal' result for a parser shootout. PurePerl is the clear loser and expat is the clear winner. LibXML is looking pretty good, but that parser's advantages don't really shine unless you're building DOM trees - when it moves into a class of its own.

So what's causing your problem? Dunno. You seem to have triggered some pathological corner case in expat. But really, it seems to me like a case of "Doctor it hurts when I do this". Had you considered passing just the filename for the MP3 file?

Update: Yeah thanks dbecoll - I obviously had my brain disengaged when I typed that.

In reply to Re: XML::Simple hangs by grantm
in thread XML::Simple hangs by Ovid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.