vadim_t has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a server preforking using Net::Server. I use the socketpair communication to talk to the parent server when needed. So every child uses IO::Select on two handles, the socket to the client, and the socket to the parent.

Now, my problem is that the client protocol is in XML, parsed with XML::Parser, and I'm having problems to decide when to parse it. People suggested using a stream parser, but if I understood correctly, that would mean passing the client socket to the parser, which would make the parser block on it. Unless I'm missing something, this would leave me unable to use parent/child communication.

I've thought of separating requests in different ways, like length-data packets, separating XML with \0 or something else, making all XML go inside <data></data> which would be extracted with a regexp... but all those are pretty ugly and would make debugging harder than it could be.

I'm pretty sure there's got to be some way of doing stream parsing without having to throw out the parent/child communication.

Could anybody help me with this?

Thanks

Replies are listed 'Best First'.
•Re: How to parse XML coming from a socket?
by merlyn (Sage) on Oct 04, 2003 at 14:05 UTC
    Sounds like you're reinventing SOAP or Jabber. Might be a good time to look at existing implementations and either reuse or cannabilize rather than try to solve problems that have been solved repeatedly by others. Save your mind for solving your unique issues, not reinventing the wheel.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Kind of, perhaps. But there's not really much that I have left to solve. Somebody on #perl commented that basically I'm doing a kind of HURD. The server itself is a small core with several independent support services around it. The server already works, as well as the parsing. Basically, that's the largest problem I have right now. If I can fix it, pretty much all that is left is to design a proper protocol, and add handlers.
        pretty much all that is left is to design a proper protocol, and add handlers.
        Heh. You probably don't realize that you did the easy part, and the hard part is what's ahead.

        Designing a protocol that is clean, transparent, scalable, and reusable is a royal pain. If you had leveraged off SOAP or Jabber, you'd also have the advantage of interoperability and pre-existing mindset and docs.

        Good luck, because it looks like you're doing things the hard way.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: How to parse XML coming from a socket?
by mirod (Canon) on Oct 04, 2003 at 16:48 UTC

    Have you looked at XML::Stream?

    Also, XML::Parser actually can parse a stream of XML documents, separated by a delimiter. See the StreamDelimiter option.

      Thanks, XML::Stream seems to be pretty much what I need. It looks like the timeout in Process can be exactly what I need to get it done.

      It also seems that I could use the StreamDelimiter thing, although then it looks like I'd have to pick some string that wouldn't appear in normal output.

        and take a look at Net::Jabber this time because it uses XML::Stream for it's reading/writing sockets. (and not because you are reinventing it =P)

        from what i gleaned from your other posts i think you do need to wrap your messages in a top-level node like:

        <data> <stuff>whatever goes here</stuff> <to>server</to> <from>client</from> </data>

        if you want to make things easier.

        and you should check out the Jabber specs just because there's a lot of thought out XML in the specs (and the experimental specs) for things like queries/messages/presence and just about anything else you could think of to pass through an XML message router.

Re: How to parse XML coming from a socket?
by Aristotle (Chancellor) on Oct 04, 2003 at 17:36 UTC

    I'd suggest a switch to XML::LibXML.

    It can accept an XML document in chunks, the way you need it. You get a DOM and SAX parser in one package and can mix access to DOM trees with XPath expressions. The actual XML parsing and document tree storage is all done on the C side, so it's much faster and consumes astonishingly little memory compared with XML::Parser-based modules.

    It doesn't get any better than that. I'd been a fan of XML::Twig until I discovered this one. Don't leave home without it.

    Makeshifts last the longest.

      This is interesting: I was actually thinking of writing a module similar to XML::DOM::Twig (which is hopelessly out of date BTW), on top of XML::LibXML. It would use as many of XML::LibXML native functions as possible, but add convenience methods from XML::Twig. I found that using XPath in XML::LibXML makes it really easy to access data in the XML tree, but that you then have to use the DOM for modifying it, which is verbose and clumsy.

      Does this make sense, and which methods would you think would be most helpful? For eaxmple I really miss XML::Twig's version of insert when I use the DOM. prefix is another really useful one that can't be done simply.

        To be honest, I have no idea. So far I've had no need to transform existing documents, and mostly need to parse XML documents or occasionaly create ones from scratch. Full XPath support is very helpful to parse; Twig's cut down version doesn't compare. Nevertheless, something along the lines of XML::LibXML::Twig would certainly be handy when I eventually have to deal with transformation type tasks.

        Makeshifts last the longest.

      XML::LibXML looks very good too, thanks!

      So many helpful replies :-) Now I'm going to have to choose one of them, though.