Re: Clean data - where field contains a CRLF

The following works for the sample data you have given. Note the hard wired field count and that the code will die if something bad happens.

use strict;
use warnings;

use constant FIELDS => 26;

my $line = '';

while (<DATA>) {
    s/\r//g;
    chomp;
    $line .= $_;
    my $fields = $line=~ tr/|//;
    next if FIELDS > $fields;

    die "Field count too great in line $." if FIELDS < $fields;
    my @fields = split /\|/, $line;
    $line = '';
    print join ' ', @fields, "\n\n";
}

__DATA__
EN|486822|||KKJSKA|L|L00219796|STR, JASON A|JASON|A|STR|||||3710 |NORT
+H CANTON|OH|44720|||000|0003053964|I|||
EN|486823|||YYYYYY|L|L00738657|OCID, SEAN M|SEAN|M|OCID|||||3846 Foxta
+il Lane

|CINCINNATI|OH|45248|||000|0009544289|I|||
EN|486824|||KXXXXP|L||DSBS, ANDREW J|ANDREW|J|DSBS|||||28835 STILXXXXX
+X|FARXXXXX HILLS|MI|48334|||000||I|||
[download]

Prints:

EN 486822   KKJSKA L L00219796 STR, JASON A JASON A STR     3710  NORT
+H CANTON OH 44720   000 0003053964 I 

EN 486823   YYYYYY L L00738657 OCID, SEAN M SEAN M OCID     3846 Foxta
+il Lane CINCINNATI OH 45248   000 0009544289 I 

EN 486824   KXXXXP L  DSBS, ANDREW J ANDREW J DSBS     28835 STILXXXXX
+X FARXXXXX HILLS MI 48334   000  I
[download]

DWIM is Perl's answer to Gödel

Comment on Re: Clean data - where field contains a CRLF Select or Download Code

Replies are listed 'Best First'.
Re^2: Clean data - where field contains a CRLF by graff (Chancellor) on Aug 21, 2006 at 01:08 UTC
Minor nitpick, Grampa: `# s/\r//g; # chomp; # expressed better (less platform dependent) as: s/[\r\n]+//g; # or, to be compulsive, use the numerics: s/[\x0a\x0d]+//g;` [download] According to the perl docs I've seen, chomp "removes any trailing string that corresponds to the current value of $/". If perl has $/ set to "\r\n", taking away the "\r" before chomping might cause the chomp to do nothing at all. (But I'm not a windows user, so I could be wrong about that.) Also, depending on the data and the task, it might make more sense to replace every `[\r\n]+` with a space, rather than an empty string, esp. if consecutive lines will be concatenated into a single string.	[reply] [d/l] [select]
Re^3: Clean data - where field contains a CRLF by GrandFather (Saint) on Aug 21, 2006 at 02:37 UTC
Possibly a Mac issue, but not a Windows issue. Perl's IO processing will already have converted CRLF to \n under Windows. The code I posted was tested using Windows. However I agree that your regex solution is likely to be better. I'd avoid the "numeric" version though. That makes it more, rather than less, sensitive to OS and character sets. Perl converts native line ends to \n (which may or may not be an actual new line character), and sets $/ to \n by default so it doesn't matter what the native OS line end convention is and it doesn't matter what character encoding is used - \n procesing using non-binary mode I/O should be portable with Perl. DWIM is Perl's answer to Gödel	[reply]