Re^4: Modifying CSV File

Here is the code I am using (with thanks to AnomalousMonk for the regex):

while (<$fh1>) {
    chomp;
    next if $_ eq '';
    s{ ("[^"]+") }{ (my $one = $1) =~ s{,}{-}xmsg; $one =~ s{"}{}g; $o
+ne; }xmsge;
    print $_, "\n";
}
[download]

The test file you made should be sufficient because the only thing I am changing is the comma to a dash and removing the quotes from the one column in question.

"Its not how hard you work, its how much you get done."

Comment on Re^4: Modifying CSV File Download Code

Replies are listed 'Best First'.
Re^5: Modifying CSV File by Tux (Canon) on Jun 18, 2015 at 12:41 UTC
That ran in `4.194` on my dataset, which can be reduced by simplifying the regex even more. `open my $io, "<", "test.csv"; open my $oh, ">", "out.csv"; while (<$io>) { s{ ("[^""]+") }{ (my $one = $1) =~ tr{,}{-}; $one =~ tr{""}{}d; $o +ne; }xge; print $oh $_; }` [download] runs in `3.229`. All regex-based scripts will fail if the first field is quoted; the second field has a embedded double-quote (or an escaped character with the default " as escape) any record anywhere in the dataset has an embedded newline, and the data after the newline has a double-quote As long as you are absolutely certain that the CSV data is uniformly and consistently laid out as in these two lines, you are safe. I would personally never take that risk, unless that two seconds are a problem. 5 seconds for 1.4 mln records is pretty fast, knowing it is always safe. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^6: Modifying CSV File by roho (Bishop) on Jun 19, 2015 at 15:38 UTC
Thanks for the regex mods. I am certain the CSV file will always be that format because it comes from another part of the system and if it were to change I would be the one asked to change it. "Its not how hard you work, its how much you get done."	[reply]