Joma has asked for the wisdom of the Perl Monks concerning the following question:

Recently became involved with Regular Expressions and, while experimenting with Perl scripts using Text::ParseWords, I came up with a script containing the following code: I read a file with the CSV text line: "earth",1,,"moon",9.374

use Text::ParseWords; @fields = (); my $file = $ARGV[0] or die "Missing CSV file on the command line\n"; open($text, '<', $file) or die "Could not open '$file' $!\n"; $line = <$text>; @fields = quotewords(',', 0, $line); foreach $field (0..$#fields) { print $field + 1 . " $fields[$field]\n"; }

#------------------------------------ The above works fine and prints: 1 earth 2 1 3 4 moon 5 9.374 #=================================================================== However, if I try the following,

my $file = $ARGV[0] or die "Missing CSV file on the command line\n"; open($text, '<', $file) or die "Could not open '$file' $!\n"; while ($line = <$text>) { @fields = quotewords(',', 0, $line); } foreach $field (0..$#fields) { print $field + 1 . " $fields[$field]\n"; }

it only prints: 1 What am I missing here?

Replies are listed 'Best First'.
Re: Parsing CSV file
by Tux (Canon) on Jul 05, 2016 at 07:07 UTC

    Not to take away your motivation, but parsing CSV like this is a dead end. There are way to many edge cases to make this work reliable (if at all). There are two de-facto CSV parsers available already that support all the options users will ask for: Text::CSV_XS and Text::CSV (which uses Text::CSV_XS when installed).

    If your CSV restricts itself to the simplest CSV possible - comma separated data, no embedded newlines, no mixed line endings, no unicode only supporting the default quotation and escapes (both being ") - then you can get away with the dead-simple approach, but Text::CSV::Easy_XS and Text::CSV::Easy_PP are already available for that.


    Enjoy, Have FUN! H.Merijn
Re: Parsing CSV file
by Athanasius (Archbishop) on Jul 05, 2016 at 04:28 UTC

    Hello Joma, and welcome to the Monastery!

    You say your second script prints only 1, but when I run it I get the same result as you show for the first script. However, there is a logic error in your code: the foreach loop needs to be moved to within the while loop, otherwise you will only ever print out the fields for the final line read from the CSV file. (This isn’t apparent when there is only one line in the input file.) And you should always use strict and use warnings:

    use strict; use warnings; use Text::ParseWords; while (my $line = <DATA>) { my @fields = quotewords(',', 0, $line); for my $field (0 .. $#fields) { print $field + 1, " $fields[$field]\n"; } } __DATA__ "earth",1,,"moon",9.374 "mars",2,,"phobos",,"deimos",

    Output:

    14:17 >perl 1671_SoPW.pl 1 earth 2 1 3 4 moon 5 9.374 1 mars 2 2 3 4 phobos 5 6 deimos 7 14:20 >

    BTW, good job on adding <code>...</code> tags to your second post. Two tips: (1) It would have been better to update your first post. When you post as a logged-in user, you can always update that post later. (Click on the “Edit” button in the top, right-hand part of the screen.) Just remember to mark updates as such to avoid confusing readers of the thread. (2) You should also put the output you get into <code> tags, to make it easier to read.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi Athanasius, The main reason behind my initial posting was that, when I used regular expression to parse the CSV file with one or more lines as shown in the code below, it always worked fine. I couldn't figure out why the print outside the while loop didn't work when using the Text::ParseWords quotewords.

      $file = $ARGV[0] or die "Missing CSV file on the command line\n"; open($text, '<:encoding(UTF-8)', $file) or die "Could not open '$file' + $!\n"; @fields = (); # initialize @fields to be empty while ($line = <$text>) { chomp($line); # remove the newline at the end of the line while ($line =~ m/"([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,/g) { push(@fields, defined($1) ? $1 : $3); # add the matched fie +ld } # push(@fields, undef) if $line =~ m/,?/; # account for an empt +y last field } foreach $field(0..$#fields) { print $field + 1 . " $fields[$field]\n"; } close $file;

      I am a newcomer to Perl and I am really enjoying it. Thanks for your help.

        Hello Joma,

        I’ll make three observations on the regex code shown:

        1. There’s no point in capturing to $2 if that capture is never used. It would be better to use a non-capturing group here:

          while ($line =~ m/"([^"\\]*(?:\\.[^"\\]*)*)",?|([^,]+),?|,/g) { # ^^^ push(@fields, defined($1) ? $1 : $2); # ^^

          See perlretut#Non-capturing-groupings.

        2. When testing for definedness, Perl’s // (logical defined-or) operator is useful and elegant:

          push @fields, $1 // $2;

          See perlop#Logical-Defined-Or.

        3. If you had use warnings at the head of your script (and you should!), you would get a Use of uninitialized value warning each time you try to print an array element whose value is undef. You can fix this easily by substituting an empty string:

          push @fields, $1 // $2 // '';

        Update: ++choroba for pointing out that the Branch Reset pattern (perlre#Extended-Patterns) is a more elegant option here.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Parsing CSV file
by genio (Beadle) on Jul 05, 2016 at 01:31 UTC
    When you have CSV data, it's best to use a CSV parser:

    https://metacpan.org/pod/Text::CSV

    There is a good amount of sample code for you right there in the docs. However, your particular error is one of scoping and array index syntax.
    #!/usr/bin/env perl use strict; use warnings; use v5.10; use Text::ParseWords; my $file = $ARGV[0] or die "Need a CSV file"; open my $fh, '<:encoding(UTF-8)', $file or die "Oops: $file $!"; while (my $line = <$fh>) { chomp $line; my @fields = quotewords(',',0,$line); for my $i (0..$#fields) { say "$i $fields[$i]"; } }

    Please note that the entire code block above is untested and I've never used Text::ParseWords. I simply built an example from your supplied attempt.

      I normally use a parser using regular expressions, but decided to try this approach as well. Your solution does work. Thank you

Re: Parsing CSV file
by duyet (Friar) on Jul 05, 2016 at 06:14 UTC
    It would be ok if you have control on your input file. If not, and if there are comma in a string, then it won't work.
    "string",2,,"a,b,c",1.234
    The Perl Cookbook has a chapter Parsing Comma-Separated Data which cover the whole thing more thoroughly.
Re: Parsing CSV file
by genio (Beadle) on Jul 05, 2016 at 01:33 UTC