I've been tasked with parsing some daily "xml" files and gathering data from them. I use "xml" as it is an abomination with a .xml extension. The issue is that in the file there are MANY Windows newlines intermixed with the valid Unix line feeds. This results in things like:
Where the newlines that LOOK correct are line feeds and where there is a line per character is a <CR><LF>.<FormattedReportObjects> <FormattedReportObject xsi:type="FormattedField" Type="xsd:long" FieldName="{Sum_ttx. E v e n t I D } " > <ObjectName>Field2</ObjectName> <FormattedValue>0</FormattedValue>
Anyone have any ideas on how I could fix this? (The obvious "make your XML valid" has been tried and failed) I've tried tr, sed, and perl one liners, all to no avail. E.G.
I appreciate any help anyone can provide. Thanks.perl -ne ' s/\r\n?//g; print ' foo.xml sed -e s/^M\n//g foo.xml tr -d ^M\n foo.xml
In reply to Scrubbing XML by the.duck
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |