in reply to Re^3: Modifying CSV File
in thread Modifying CSV File

Here is the code I am using (with thanks to AnomalousMonk for the regex):

while (<$fh1>) { chomp; next if $_ eq ''; s{ ("[^"]+") }{ (my $one = $1) =~ s{,}{-}xmsg; $one =~ s{"}{}g; $o +ne; }xmsge; print $_, "\n"; }

The test file you made should be sufficient because the only thing I am changing is the comma to a dash and removing the quotes from the one column in question.

"Its not how hard you work, its how much you get done."

Replies are listed 'Best First'.
Re^5: Modifying CSV File
by Tux (Canon) on Jun 18, 2015 at 12:41 UTC

    That ran in 4.194 on my dataset, which can be reduced by simplifying the regex even more.

    open my $io, "<", "test.csv"; open my $oh, ">", "out.csv"; while (<$io>) { s{ ("[^""]+") }{ (my $one = $1) =~ tr{,}{-}; $one =~ tr{""}{}d; $o +ne; }xge; print $oh $_; }

    runs in 3.229. All regex-based scripts will fail if

    • the first field is quoted;
    • the second field has a embedded double-quote (or an escaped character with the default " as escape)
    • any record anywhere in the dataset has an embedded newline, and the data after the newline has a double-quote

    As long as you are absolutely certain that the CSV data is uniformly and consistently laid out as in these two lines, you are safe.

    I would personally never take that risk, unless that two seconds are a problem. 5 seconds for 1.4 mln records is pretty fast, knowing it is always safe.


    Enjoy, Have FUN! H.Merijn
      Thanks for the regex mods. I am certain the CSV file will always be that format because it comes from another part of the system and if it were to change I would be the one asked to change it.

      "Its not how hard you work, its how much you get done."