Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to figure out how to best parse XML or HTML messages with POE in a non-blocking way. I'm pulling XML and HTML with a PoCo HTTP client and receiving messages sized anywhere between 2K and 10MB. SAX-based parsing seems to be the best bet, where I can trigger events as tags are encountered. But what happens when there is a lot of character data between tags? That case would block as the data is parsed until the next tag is found. Is there a proper way to do this?

Replies are listed 'Best First'.
Re: POE and non-blocking XML/HTML parsing
by rcaputo (Chaplain) on Apr 22, 2006 at 18:54 UTC

    You might try POE::Filter::XML, although I've never used it. Remember: You can use POE::Filter::* objects without attaching them to POE::Wheel objects, so it's possible to feed streamed data through them.

Re: POE and non-blocking XML/HTML parsing
by aufflick (Deacon) on Apr 24, 2006 at 06:15 UTC
    For a more perl-ish way of triggering events on tags, take a look at XML::Twig. It's pretty nice.
Re: POE and non-blocking XML/HTML parsing
by Anonymous Monk on Apr 24, 2006 at 20:41 UTC
    Both modules seem like they work, but would I have to create a separate parser for each request, so that the XML streams are not mixed up? Since I'll be pulling multiple files at the same time under 'Streaming' mode, I'm worried that the XML chunks from the different requests will get mixed up.

      You've said so little about what is driving this processing that is it difficult to see why you need multi-tasking?

      Why not just start a new process for each url that fetches the XML and processes it?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Because that's horribly inefficient when you're dealing with many, many data sources.