ezekiel has asked for the wisdom of the Perl Monks concerning the following question:

I have been given a tab separated flat file to process. Each line has at least three strings separated by tabs and (of course) a newline at the end.

I have been using a simple script to open the file, read each line, trim the newline, and split the line into an array based on tabs ie

while <FILE> { chomp; @line = split("\t"); ... do things to @line... }

Recently I have found that some strings in the line have carriage returns in them. This throws my simple routine into chaos because it is now treating the one "entry" (or line) as multiple entries (or lines).

Any ideas as to how to get around this problem?

Thanks.

Replies are listed 'Best First'.
Re: Parsing lines containing extra carriage returns
by bikeNomad (Priest) on Jun 13, 2001 at 06:27 UTC
    If you have carriage returns (in addition to newlines) in your lines, just delete them first:

    while <FILE> { tr/\r\n//d; # etc.
    The above will replace chomp as well.

    If you have blank lines (two consecutive newlines), you can just detect them after the tr:

    tr/\r\n//d; next if $_ eq '';
Re: Parsing lines containing extra carriage returns
by lemming (Priest) on Jun 13, 2001 at 06:48 UTC
    This will do it, but it will get rid of the returns in the data. If that's not your intention, you'll need to be a bit more fancy than the "." operator.
    my $number_of_fields = 10; my $hold = ""; while(<FILE>) { chomp; my $line = $hold.$_; my @line = split("\t", $line); if (@line < $number_of_fields) { $hold = $line; next; } $hold = ""; do stuff }

    Update:
    Forgot to mention that you need to know the # of fields that you are expected to have. And error checking would be desired.

Re: Parsing lines containing extra carriage returns
by jeroenes (Priest) on Jun 13, 2001 at 09:34 UTC
    If bikeNomad's solution doesn't work, just split the whole thing on tabs (untested):
    my $file; { local $/ = undef; # learned something ;} $file = <>; #or <FILE> } $file =~ tr/\r//d; #remove carriages my @items = split /[\t\n]/, $file; #one array of items; my $rowsize = 3; my @AoA; while( @items >= $rowsize ){ push @AoA, [splice @items, 0, $rowsize]; }
    .... this works only if you're garanteed to have 3 columns.

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)