thistle has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks!

I'm trying to add a component to an existing POE daemon that needs to connect to an http server persistently, wait for alerts in XML, and process them. Using PoCo::Client::HTTP works well for receiving the stream in chunks.

What I need recommendations on is how to reassemble the chunks as they are passed to the output event. Ultimately I'd want to identify when a top level node was complete, parse it, and later insert it into a DB. I could probably hand roll this, caching the XML chunks, pulling them off fifo-like as they are completed, or perhaps write them to a file and tail it with a sax-based parser. But I'm wondering if there's a parser that already does this or if anyone has a more elegant solution.

Replies are listed 'Best First'.
Re: Processing XML stream via HTTP
by spx2 (Deacon) on May 02, 2009 at 01:34 UTC

    A FIFO data structure( a queue in particular ) would be ok for what you need , on one end you could push data and on the other you can read line-by-line . I haven't tried the sax-based parser but the libraries/modules I've used in both Perl/C++ were building a tree structure in memory in which they fitted the XML , if this is what the sax-based parser does also it won't be of any use because by passing the XML in a stream you'll have a incomplete XML and thus it won't be a valid XML,but just part of a valid one(the original one you're sending through http).

    To sum up , 2 threads/processes , one reading the XML from the stream , one processing the queue and filling up an entry structure, when you hit the end tag of your entry   < / name_of_tag > , put it in a DB and continue. You'll also need some form of IPC and you'll need the queue to be shared among the threads/processes.

Re: Processing XML stream via HTTP
by aufflick (Deacon) on May 03, 2009 at 08:59 UTC
    A great way to parse a stream of XML ia XML::Twig - you specify callbacks (or "handlers" in the nomenclature of the XML::Twig documentation) for the tags you're interested in, and parse chunks of xml as they arrive. Your callback will be called as soon as the tags you are watching arrive.