in reply to need to optimize my sub routine
Simplest speedup should be to move from reading line-by-line and using parse () to use getline (). This will pay off even more when you allow binary and/or embedded newlines. I did a small benchmark on my machine:
/home/merijn> cat test.pl #!/pro/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use Text::CSV_XS; use IO::Handle; my $csv = Text::CSV_XS->new; my @f; sub diamond { open my $io, "<", "test.csv" or die "test.csv: $!"; while (<$io>) { $csv->parse ($_); @f = $csv->fields; } } # diamond sub intern { open my $io, "<", "test.csv" or die "test.csv: $!"; while (my $row = $csv->getline ($io)) { @f = @$row; } } # intern cmpthese (-5, { "diamond" => \&diamond, "getline" => \&intern }); /home/merijn> wc -l test.csv 12000 test.csv /home/merijn> perl test.pl Rate diamond getline diamond 6.89/s -- -39% getline 11.3/s 64% -- /home/merijn>
You can use the first field to do your after-matches.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: need to optimize my sub routine
by convenientstore (Pilgrim) on Feb 21, 2008 at 01:09 UTC | |
|
Re^2: need to optimize my sub routine
by convenientstore (Pilgrim) on Feb 21, 2008 at 05:51 UTC | |
by Tux (Canon) on Feb 21, 2008 at 07:20 UTC |