in reply to Re^2: Parsing CSV file
in thread Parsing CSV file

Hello Joma,

I’ll make three observations on the regex code shown:

  1. There’s no point in capturing to $2 if that capture is never used. It would be better to use a non-capturing group here:

    while ($line =~ m/"([^"\\]*(?:\\.[^"\\]*)*)",?|([^,]+),?|,/g) { # ^^^ push(@fields, defined($1) ? $1 : $2); # ^^

    See perlretut#Non-capturing-groupings.

  2. When testing for definedness, Perl’s // (logical defined-or) operator is useful and elegant:

    push @fields, $1 // $2;

    See perlop#Logical-Defined-Or.

  3. If you had use warnings at the head of your script (and you should!), you would get a Use of uninitialized value warning each time you try to print an array element whose value is undef. You can fix this easily by substituting an empty string:

    push @fields, $1 // $2 // '';

Update: ++choroba for pointing out that the Branch Reset pattern (perlre#Extended-Patterns) is a more elegant option here.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^4: Parsing CSV file
by choroba (Cardinal) on Jul 07, 2016 at 12:26 UTC
    Using $1 // $2 smells like you can use the Branch Reset pattern (5.10+), which restarts the capture group numbering on each | :

    while ($line =~ m/(?|"([^"\\]*(?:\\.[^"\\]*)*)",?|([^,]+),?|,)/g) { print $1 // q(), "\n"; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,