in reply to Clean data - where field contains a CRLF

Grandfather's approach to your problem is more direct than this. Starting from your post's title, I anticipated a different question: cleaning data which should have newlines in it, e.g. address is one field directly printable.

I doubt I have the same trailing trash on my line-ends as you do.

Be well,
rir

#!/usr/bin/perl use warnings; use strict; my $separator = '|'; my $input = ""; while (<DATA>) { $input .= $_; if ($input =~ m/ ^ # start of string ( # group 1 is (?: # group 2 not memo'd is [^|]*\| # any # not-pipe then a pipe ) {26} # repeated 26 times total $ # and reaching the end of string ) /sx # while matching \n with . ) { my $record = $1; $input =~ s/^\Q$record\E\n//s; # trailing trash?? my @arr = split /\|/, $record; # process @arr } } __DATA__ EN|486822|||KKJSKA|L|L00219796|STR, JASON A|JASON|A|STR|||||3710 |NORT +H CANTON|OH|44720|||000|0003053964|I||| EN|486823|||YYYYYY|L|L00738657|OCID, SEAN M|SEAN|M|OCID|||||3846 Foxta +il Lane |CINCINNATI|OH|45248|||000|0009544289|I||| EN|486824|||KXXXXP|L||DSBS, ANDREW J|ANDREW|J|DSBS|||||28835 STILXXXXX +X|FARXXXXX HILLS|MI|48334|||000||I|||

Replies are listed 'Best First'.
Re^2: Clean data - where field contains a CRLF
by sxmwb (Pilgrim) on Aug 21, 2006 at 13:18 UTC
    I think you understand where I was heading. One thought as I slept on it last night was could you split multi-line records using and capturing the EN since it is the first character of the record. Then remove the extra CRLFs except for the last one. I do not know if this is possible using the split command over multiple lines. Thoughts? Thanks Mike
      Yes, you can. Split doesn't care about newlines unless they are in its first argument.

      Be well,
      rir