the.duck has asked for the wisdom of the Perl Monks concerning the following question:
I've been tasked with parsing some daily "xml" files and gathering data from them. I use "xml" as it is an abomination with a .xml extension. The issue is that in the file there are MANY Windows newlines intermixed with the valid Unix line feeds. This results in things like:
Where the newlines that LOOK correct are line feeds and where there is a line per character is a <CR><LF>.<FormattedReportObjects> <FormattedReportObject xsi:type="FormattedField" Type="xsd:long" FieldName="{Sum_ttx. E v e n t I D } " > <ObjectName>Field2</ObjectName> <FormattedValue>0</FormattedValue>
Anyone have any ideas on how I could fix this? (The obvious "make your XML valid" has been tried and failed) I've tried tr, sed, and perl one liners, all to no avail. E.G.
I appreciate any help anyone can provide. Thanks.perl -ne ' s/\r\n?//g; print ' foo.xml sed -e s/^M\n//g foo.xml tr -d ^M\n foo.xml
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Scrubbing XML
by anonymized user 468275 (Curate) on Apr 18, 2011 at 16:03 UTC | |
by the.duck (Novice) on Apr 18, 2011 at 16:12 UTC | |
by anonymized user 468275 (Curate) on Apr 18, 2011 at 16:24 UTC | |
|
Re: Scrubbing XML
by cdarke (Prior) on Apr 18, 2011 at 16:00 UTC | |
by Your Mother (Archbishop) on Apr 18, 2011 at 16:41 UTC | |
|
Re: Scrubbing XML
by halfcountplus (Hermit) on Apr 18, 2011 at 16:00 UTC | |
|
Re: Scrubbing XML
by Anonymous Monk on Apr 18, 2011 at 18:47 UTC | |
by locked_user sundialsvc4 (Abbot) on Apr 18, 2011 at 19:06 UTC | |
by ikegami (Patriarch) on Apr 18, 2011 at 19:17 UTC | |
by Your Mother (Archbishop) on Jun 02, 2011 at 13:15 UTC | |
by ikegami (Patriarch) on Jun 02, 2011 at 15:59 UTC | |
| |
by the.duck (Novice) on Apr 18, 2011 at 19:48 UTC | |
by ikegami (Patriarch) on Apr 18, 2011 at 19:58 UTC |