I concede that the difficulty of processing these two files suggests to me that something has gone very wrong with the input specifications. And since XML is part of that, I naturally assume it's the fault of XML. I might try to get them to change the specification such that each line is generally well formed XML, but cannot contain any new lines. Then just do a standard double line reader.