comment on

I have a set of data where each field is separated by a '|' and the end of record is CRLF or LF. The problem I have run into is that the original application, that I have no control over, allows fields to contain CRLFs or LFs. I thought one approach was to count '|' and validate the number of field per row read in and if there were not enough chomp the CRLF or LF and join the next record to that record. This approach might work but was wondering what other possibilities you someone else may have used to help clean data with this kind of problem. Here is a a sample the data:

EN|486822|||KKJSKA|L|L00219796|STR, JASON A|JASON|A|STR|||||3710 |NORT
+H CANTON|OH|44720|||000|0003053964|I|||
EN|486823|||YYYYYY|L|L00738657|OCID, SEAN M|SEAN|M|OCID|||||3846 Foxta
+il Lane

|CINCINNATI|OH|45248|||000|0009544289|I|||
EN|486824|||KXXXXP|L||DSBS, ANDREW J|ANDREW|J|DSBS|||||28835 STILXXXXX
+X|FARXXXXX HILLS|MI|48334|||000||I|||
[download]

Thanks for any ideas. Mike

In reply to Clean data - where field contains a CRLF by sxmwb

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.