Boy, does this look problematic! If there are newlines embedded in the records, then what is the record delimiter? Presumably, the answer is that when the specified pipe delimited fields have gone by, the next newline delimits the records; or more exactly, newlines within the message field are not record delimiters, but a newline outside of that field is a record delimiter. Because you don't seem to have control over the data that you are being fed, it looks like you are stuck with an
extremely poorly formatted data set. This problem is actually only reducible if you
promise that there won't be any pipes in the data. Promise? ;)
The first problem, then, is that the angle bracket operator (that is, <DATA> in your example) will break the file input on each newline by default, so you will be breaking the $line input on those newlines embedded in the message section of the text. To properly parse this, your best bet is probably to slurp the entire file in one fell swoop by temporarily doing undef $/ (the record delimiter variable set to undef reads to the end of the file). Then you will have to "manually" parse the file by looking for the first newline following five pipe characters to build the records. This is not a trivial task, mind you!
This is just intended as an overview of the method I would recommend. Given some time, I may work out a code implementation, or maybe some other monk will oblige before I get the chance.
-
HZ