POE and non-blocking XML/HTML parsing

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to figure out how to best parse XML or HTML messages with POE in a non-blocking way. I'm pulling XML and HTML with a PoCo HTTP client and receiving messages sized anywhere between 2K and 10MB. SAX-based parsing seems to be the best bet, where I can trigger events as tags are encountered. But what happens when there is a lot of character data between tags? That case would block as the data is parsed until the next tag is found. Is there a proper way to do this?

Comment on POE and non-blocking XML/HTML parsing

Replies are listed 'Best First'.
Re: POE and non-blocking XML/HTML parsing by rcaputo (Chaplain) on Apr 22, 2006 at 18:54 UTC
You might try POE::Filter::XML, although I've never used it. Remember: You can use POE::Filter::* objects without attaching them to POE::Wheel objects, so it's possible to feed streamed data through them. `-- Rocco Caputo - http://poe.perl.org/`	[reply]
Re: POE and non-blocking XML/HTML parsing by aufflick (Deacon) on Apr 24, 2006 at 06:15 UTC
For a more perl-ish way of triggering events on tags, take a look at XML::Twig. It's pretty nice.	[reply]
Re: POE and non-blocking XML/HTML parsing by Anonymous Monk on Apr 24, 2006 at 20:41 UTC
Both modules seem like they work, but would I have to create a separate parser for each request, so that the XML streams are not mixed up? Since I'll be pulling multiple files at the same time under 'Streaming' mode, I'm worried that the XML chunks from the different requests will get mixed up.	[reply]
Re^2: POE and non-blocking XML/HTML parsing by BrowserUk (Patriarch) on Apr 25, 2006 at 00:29 UTC
You've said so little about what is driving this processing that is it difficult to see why you need multi-tasking? Why not just start a new process for each url that fetches the XML and processes it? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: POE and non-blocking XML/HTML parsing by Anonymous Monk on Apr 25, 2006 at 01:26 UTC
Because that's horribly inefficient when you're dealing with many, many data sources.	[reply]
Re^4: POE and non-blocking XML/HTML parsing by BrowserUk (Patriarch) on Apr 25, 2006 at 01:50 UTC