Takuan Soho has asked for the wisdom of the Perl Monks concerning the following question:
### Test program use Text::CSV; $input_file = "ACGT.csv"; $output_file = "test.txt"; if (! open ENTREE, "<", $input_file) { print "Could not open handle ENTREE for $input_file. $!\n";} if (! open SORTIE, ">", $output_file) {<br> print "Could not open handle SORTIE for $output_file. $!\n";} $start = time(); my $csv_reader = Text::CSV->new(); my @columns; while (<ENTREE>) { $csv_reader->parse($_); @columns = $csv_reader->fields(); print SORTIE "$columns[2]\t$columns[0]\n"; } $end = time(); $duration = $end - $start; print "Reading took $duration seconds.\n"; close ENTREE; close SORTIE;
On a huge file (a couple hundreds of columns, around a million rows), it took 3858 seconds. More than an hour to just go through a file!!! Am I doing something inherently wrong?
As a comparison, I transformed the CSV file in a tab separated file using a small python script based on the python CSV module, and it took less than 200 seconds. Then, back to the perl script, I did the same manipulation (using the split function this time) in 273 seconds.
Do you know of a way to deal with CSV files in perl that will allow me to get to the same kind of efficiency ... without having to use a python script on the side?
Thank you very much!
|
|---|