the say method I used in the code is the say method of Text::CSV_XS and does not require a recent perl to do what it is supposed to do. It acts as if you have a second CSV objects that has eol => "\r\n". This will work back to perl-5.6.1. If both your input and output documents have the same line ending, set it in the constructor and use print instead.

Can you provide us with sample data and the simple code you use? If your data is indeed (very) simple and has no embedded newlines or escapes, then your code could be optimized to skip all situations that Text::CSV_XS has to consider. In that case, Text::CSV::Easy_XS or Text::CSV::Easy_PP might be of interest, as they do not allow altering the settings like separation character or quotation. They however have no means to check if the original field was quoted, which was what you stated as a precondition for your request. Text::CSV_XS might also be faster if you drop that requirement and/or if you know in advance which columns are object to the substitution.

If you are not interested in keeping the original quotation, and you only want to replace the comma in column 2 you can gain a lot. As an example test, I created a 1_400_000 line file with just the two lines from your data example and have two tests. The first keeps original quotation and replaces , to - in all quoted columns. The second ignores all quotation requirements and replaces all , with - just in the second column, still generationg correct CSV data:

use 5.20.0; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, quote_space => 0, keep_meta_info => 11, }); open my $io, "<", "test.csv"; open my $oh, ">", "out.csv"; my @row = ("") x 8; # The CSV has 8 columns $csv->bind_columns (\(@row)); while ($csv->getline ($io)) { $row[$_] =~ tr{,}{-} for grep { $csv->is_quoted ($_) } 0..$#row; $csv->say ($oh, \@row); }
$ time perl test.pl 20.120u 0.156s 0:20.48 98.9% 0+0k 13888+214656io 0pf+0w
use 5.20.0; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\n", }); open my $io, "<", "test.csv"; open my $oh, ">", "out.csv"; my @row = ("") x 8; $csv->bind_columns (\(@row)); while ($csv->getline ($io)) { $row[1] =~ tr{,}{-}; $csv->print ($oh, \@row); }
$ time perl test2.pl 5.564u 0.144s 0:05.90 96.6% 0+0k 16+214656io 0pf+0w

That is 3.6 times faster. YMMV


Enjoy, Have FUN! H.Merijn

In reply to Re^3: Modifying CSV File by Tux
in thread Modifying CSV File by roho

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.