tw has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am after some advice on how to read a csv file using Text::CSV_XS from line X into an array. The csv files i am working with have an unusual header so if i try to read the header an error is returned.

I can read a normal csv file with the code below, starting from line 42 in this case -- the code is essentially from CSV_XS synopsis with a few modifications.

#!/usr/bin/perl -w #use strict; #use warnings; #use diagnostics; use Text::CSV_XS; my $file = 'test.csv'; #csv file name my @rows; # array that will store csv values my $csv = Text::CSV_XS->new ({ binary => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag (); #open file open my $FH, "<:encoding(utf8)", "$file" or die "$file: $!"; #read file in while loop my $i = 1; while (my $row = $csv->getline ($FH) ) { if ($i > 42) {push @rows, $row;} $i++ } $csv->eof or $csv->error_diag (); #close file close $FH; #print contents of @rows array for my $print_rows (@rows) { print "@$print_rows\n"; }

I tried replacing getline with getline_all but i'm new to perl and not certain of the syntax or if it will even work as i expect. Or is a different loop set-up possible so the initial lines won't be parsed.

Thanks for any advice

Replies are listed 'Best First'.
Re: Text::CSV_XS read file from line X
by GrandFather (Saint) on Jan 04, 2011 at 07:26 UTC

    If you know how many header lines there are you want to skip then you can simply (using 2 lines in the example):

    <$FH> for 1 .. 2;

    before the while loop. If you need to test each line until you find the first data line you could (where headers lines start with skip):

    while (! eof $FH) { my $start = tell $FH; my $line = <$FH>; next if $line =~ /^skip/; seek $FH, $start, 0; last; }
    True laziness is hard work

      Thanks GrandFather,

      the first solution is the best and also very elegant in this case.

      #!/usr/bin/perl use strict; use warnings; use diagnostics; use Text::CSV_XS; my $file = 'OR10055708.csv'; #csv file name my @rows; # array that will store csv values my $csv = Text::CSV_XS->new ({ binary => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag (); #open file open my $FH, "<:encoding(utf8)", "$file" or die "$file: $!"; #skip first n (10 in this case) lines <$FH> for 1 .. 10; #read file in while loop # my $i = 1; while (my $row = $csv->getline ($FH) ) { # if ($i > 42) {push @rows, $row;} # $i++; } $csv->eof or $csv->error_diag (); #close file close $FH; #print contents of @rows array for my $print_rows (@rows) { print "@$print_rows\n"; }
Re: Text::CSV_XS read from line X
by Tux (Canon) on Jan 04, 2011 at 07:03 UTC

    Your program flow looks sane, and I cannot tell what is or would be the problem without seeing at least the error message you get. Better would be to also post the lines that cause the error.

    You already use error_diag () as it should, but you might consider changing the initiator to new ({ binary => 1, auto_diag => 1 }).

    What I also wonder, is why you use the -w flag on the hashbang line and have disabled use strict; and use warnings;.


    Enjoy, Have FUN! H.Merijn

      Hi Tux, thanks for the reply

      The error I get is: # CSV_XS ERROR: 2034 - EIF - Loose unescaped quote

      When trying to read the entire csv file. There is just lines of text in the header, no systematic commas.

      Basically I don't need the header lines from the csv file in the array and I can't read the whole file in using CSV_XS because of the error above, otherwise i could just remove them once in an array.

      Was just experimenting with the -w, use stict etc. just forgot to adjust before posting, sorry.

      regards

        Could it be that your data at that line looks somewhat like

        1,"a",5,"ape " betablocker",16 ^

        In that case, if the data is still valid, you should possibly add { allow_loose_quotes => 1 }, see here about two pages down.


        Enjoy, Have FUN! H.Merijn