in reply to ideas on how to improve existing code

As you are already familiar with Text::CSV, please keep using its readline instead of falling back to perl's <> method. Things turn nasty quite fast when the CSV contains nested quotation, Unicode or newlines. (This line was updated as GrandFather just copied the use of OP's reading method).

Check if you have installed Text::CSV_XS. Text::CSV is just a wrapper module over Text::CSV_PP and/or Text::CSV_XS. The XS version is about 100 times faster than the PP version and with lots of columns and/or rows, the difference adds up quite fast (see this graph).

update: I extended the speed compare tests with a plain split approach (which obviously breaks on nested sep_char, newlines and other problematic content. Even then Text::CSV_XS can outperform plain perl! See here.

Use the getline method. Do not mix perl's readline (<>) with the parse method.

(Already mentioned) Use three-arg open calls and lexical handles.

Use builtin error reporting (the auto_diag attribute). No need for else-branches at all.

#!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, # Always do so auto_diag => 1, # Makes finding bugs a whole lot easier }); my $file = "File_name.csv"; # Now you can use it in error reporting open my $fh, "<", $file or die "$file: $!"; # Three-arg open $csv->getline ($fh) for 1, 2; # Skip first two lines while (my $row = $csv->getline ($fh)) { tr/;,/ ./ for @$row; say join "\t" => @$row; } close $fh;

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: ideas on how to improve existing code
by GrandFather (Saint) on Aug 29, 2011 at 20:41 UTC

    I'm not sure where you get the "perl parsing method" idea from as my sample code in reply to the OP is a minor refactoring of the OP's code. It uses $csv->parse ($line) to parse individual lines from the file as does the OP's code.

    However, despite the OP's proffered guess that the code may be inefficient, the plea in the title of the node is "ideas on how to improve existing code" which I addresses by pointing to maintenance and reliability related good coding practises. People seem to get hung up about "efficiency" issues without any need at all to make the code execute more quickly. Maintenance and reliability are generally much more important. If the OP were not using a module for parsing csv already I'd recommend using a module just from the point of view of saving programmer time and making the code more reliable and maintainable - "efficiency" is generally completely irrelevant!

    True laziness is hard work

      It was nothing personal, but I am on a crusade in trying to prevent people that already use Text::CSV, Text::CSV_XS and/or Text::CSV_PP to use

      while (<$fh>) { my @row = $csv->parse ($_); ...

      which I was referring to with your "perl parsing method" (or any variation thereof) instead of

      while (my $row = $csv->getline ($fh) { ...

      Which is not only faster, but immensely safer. Under the hood both <> and $csv->getline use perl's getline function and both respect the (perlio) layers, but the upper method is not able to detect the difference between an embedded $/ inside quotes or at the end of a line, so it is very open to erroneous behavior. Besides that, CSV's getline is more lenient towards trailing carriage returns and/or newlines (unless you explicitely set the eol attribute.


      Enjoy, Have FUN! H.Merijn
Re^2: ideas on how to improve existing code
by trolis (Novice) on Aug 29, 2011 at 18:48 UTC
    Tux, thanks a lot for taking your time to explain me the basics!