in reply to Parsing CSV file

Hello Joma, and welcome to the Monastery!

You say your second script prints only 1, but when I run it I get the same result as you show for the first script. However, there is a logic error in your code: the foreach loop needs to be moved to within the while loop, otherwise you will only ever print out the fields for the final line read from the CSV file. (This isn’t apparent when there is only one line in the input file.) And you should always use strict and use warnings:

use strict; use warnings; use Text::ParseWords; while (my $line = <DATA>) { my @fields = quotewords(',', 0, $line); for my $field (0 .. $#fields) { print $field + 1, " $fields[$field]\n"; } } __DATA__ "earth",1,,"moon",9.374 "mars",2,,"phobos",,"deimos",

Output:

14:17 >perl 1671_SoPW.pl 1 earth 2 1 3 4 moon 5 9.374 1 mars 2 2 3 4 phobos 5 6 deimos 7 14:20 >

BTW, good job on adding <code>...</code> tags to your second post. Two tips: (1) It would have been better to update your first post. When you post as a logged-in user, you can always update that post later. (Click on the “Edit” button in the top, right-hand part of the screen.) Just remember to mark updates as such to avoid confusing readers of the thread. (2) You should also put the output you get into <code> tags, to make it easier to read.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Parsing CSV file
by Joma (Initiate) on Jul 06, 2016 at 18:31 UTC

    Hi Athanasius, The main reason behind my initial posting was that, when I used regular expression to parse the CSV file with one or more lines as shown in the code below, it always worked fine. I couldn't figure out why the print outside the while loop didn't work when using the Text::ParseWords quotewords.

    $file = $ARGV[0] or die "Missing CSV file on the command line\n"; open($text, '<:encoding(UTF-8)', $file) or die "Could not open '$file' + $!\n"; @fields = (); # initialize @fields to be empty while ($line = <$text>) { chomp($line); # remove the newline at the end of the line while ($line =~ m/"([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,/g) { push(@fields, defined($1) ? $1 : $3); # add the matched fie +ld } # push(@fields, undef) if $line =~ m/,?/; # account for an empt +y last field } foreach $field(0..$#fields) { print $field + 1 . " $fields[$field]\n"; } close $file;

    I am a newcomer to Perl and I am really enjoying it. Thanks for your help.

      Hello Joma,

      I’ll make three observations on the regex code shown:

      1. There’s no point in capturing to $2 if that capture is never used. It would be better to use a non-capturing group here:

        while ($line =~ m/"([^"\\]*(?:\\.[^"\\]*)*)",?|([^,]+),?|,/g) { # ^^^ push(@fields, defined($1) ? $1 : $2); # ^^

        See perlretut#Non-capturing-groupings.

      2. When testing for definedness, Perl’s // (logical defined-or) operator is useful and elegant:

        push @fields, $1 // $2;

        See perlop#Logical-Defined-Or.

      3. If you had use warnings at the head of your script (and you should!), you would get a Use of uninitialized value warning each time you try to print an array element whose value is undef. You can fix this easily by substituting an empty string:

        push @fields, $1 // $2 // '';

      Update: ++choroba for pointing out that the Branch Reset pattern (perlre#Extended-Patterns) is a more elegant option here.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Using $1 // $2 smells like you can use the Branch Reset pattern (5.10+), which restarts the capture group numbering on each | :

        while ($line =~ m/(?|"([^"\\]*(?:\\.[^"\\]*)*)",?|([^,]+),?|,)/g) { print $1 // q(), "\n"; }

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,