xml processing line-by-line

jczeus has asked for the wisdom of the Perl Monks concerning the following question:

I have a server that receives XML commands line-by-line.

So far, it collects the lines in a string, and when it thinks it's "finished", it parses the string with XML::LibXML.

That means: if want to know that I'm finished, I have to parse the data myself before passing it to the parser and try to find the end tag of the document. Yeah, great...

I thought (hah!) it would work with parse_chunk(), but this method doesn't know when the document end is reached, either.

Does anyone have a solution for this? Maybe there's no other possibility than requesting a document length (in bytes) before the document itself - this way it would be easy to know when the end is reached.

Comment on xml processing line-by-line

Replies are listed 'Best First'.

Re: xml processing line-by-line
by Trizor (Pilgrim) on Jul 24, 2007 at 14:16 UTC

Depending on the speed of the parser and how much speed matters you could parse it in an eval and catch the exception (or check for errors however XML::LibXML handles them).

You also have other options, could the sender append an EOF or other component to the message to indicate the xml is done? XML Isn't line oriented and can be rather complex, you want to avoid being in the business of parsing it if at all possible (unless you're writing a parser...)

[reply]
[d/l]

Re^2: xml processing line-by-line

by jczeus (Monk) on Jul 24, 2007 at 14:52 UTC

1.) Do you mean "parse it until there are no more errors"? I thought of that, too. But there can be errors even when the document is complete: when it doesn't follow the DTD, for instance.

2.) Again, I thought of that, too. ;-)
But if the sender forgets this, the server appears to hang and eats more and more memory (as more data comes in).

The best approach seems really to be a content-type including the length at the beginning (before the document itself).

This way it would also be easy to use a different format, e.g. YAML, without having to guess it.

[reply]

Re: xml processing line-by-line
by gam3 (Curate) on Jul 24, 2007 at 14:37 UTC

HTML::Parser

-- gam3
A picture is worth a thousand words, but takes 200K.

[reply]

Re: xml processing line-by-line
by Jenda (Abbot) on Aug 04, 2007 at 18:49 UTC

If you know what the root tag is then checking whether the current line contains the closing root tag is trivial. Even if there are several posibilities (as long as none of them can appear as a nonroot tag under a different root).

Jenda
Support Denmark!
Defend the free world!

[reply]