ideas on how to improve existing code

trolis has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: ideas on how to improve existing code by GrandFather (Saint) on Aug 29, 2011 at 09:55 UTC
There is no efficiency issue of note. In particular, if it does the job you need to do fast enough it is sufficiently efficient. However there is a bunch of stuff that can be improved. For a start you should always use the three argument version of open and lexical file handles: `open my $csvIn, '<', $fileName or die "Failed to open $fileName: $!";` [download] It also helps if error strings provide more than just the bare minimum of information. For an open including the file name that was being used (rather than hoping it is what you think it ought to be) helps a huge amount. Skipping the first two lines in the loop that handles the rest of the lines is ok, but not near as clear as skipping them outside the loop: `<$csvIn> for 1 .. 2; while (defined (my $line = <$csvIn>)) {` [download] Using translate for simple character translation is fast and clear: `$line =~ tr/;,/ ./;` [download] Although if you were to use regular expressions you should note that there is no need to quote ; in a regular expression string and no need to quote anything in the replacement string (except stuff you'd quote if it were a double quoted string). Using consistent indentation helps a lot! I prefer to make function calls explicit by supplying parenthesis even is they aren't strictly required. At the very least you should be consistent (cf: $csv->fields and $csv->error_input). Taking that all together you'd end up with something like the following (untested) code: `#!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new (); my $fileName = 'File_name.csv'; open my $csvIn, '<', $fileName or die "Can't open $fileName: $!"; <$csvIn> for 1 .. 2; while (defined (my $line = <$csvIn>)) { $line =~ tr/;,/ ./; if ($csv->parse ($line)) { my @columns = $csv->fields (); print "@columns\n"; } else { my $err = $csv->error_input; print "Failed to parse line $.: $err"; } } close $csvIn;` [download] True laziness is hard work	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: ideas on how to improve existing code by Tux (Canon) on Aug 29, 2011 at 13:39 UTC
As you are already familiar with Text::CSV, please keep using its `readline` instead of falling back to perl's `<>` method. Things turn nasty quite fast when the CSV contains nested quotation, Unicode or newlines. (This line was updated as GrandFather just copied the use of OP's reading method). Check if you have installed Text::CSV_XS. Text::CSV is just a wrapper module over Text::CSV_PP and/or Text::CSV_XS. The XS version is about 100 times faster than the PP version and with lots of columns and/or rows, the difference adds up quite fast (see this graph). update: I extended the speed compare tests with a plain `split` approach (which obviously breaks on nested sep_char, newlines and other problematic content. Even then Text::CSV_XS can outperform plain perl! See here. Use the `getline` method. Do not mix perl's readline (`<>`) with the parse method. (Already mentioned) Use three-arg open calls and lexical handles. Use builtin error reporting (the `auto_diag` attribute). No need for else-branches at all. `#!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, # Always do so auto_diag => 1, # Makes finding bugs a whole lot easier }); my $file = "File_name.csv"; # Now you can use it in error reporting open my $fh, "<", $file or die "$file: $!"; # Three-arg open $csv->getline ($fh) for 1, 2; # Skip first two lines while (my $row = $csv->getline ($fh)) { tr/;,/ ./ for @$row; say join "\t" => @$row; } close $fh;` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^2: ideas on how to improve existing code by GrandFather (Saint) on Aug 29, 2011 at 20:41 UTC
I'm not sure where you get the "perl parsing method" idea from as my sample code in reply to the OP is a minor refactoring of the OP's code. It uses `$csv->parse ($line)` to parse individual lines from the file as does the OP's code. However, despite the OP's proffered guess that the code may be inefficient, the plea in the title of the node is "ideas on how to improve existing code" which I addresses by pointing to maintenance and reliability related good coding practises. People seem to get hung up about "efficiency" issues without any need at all to make the code execute more quickly. Maintenance and reliability are generally much more important. If the OP were not using a module for parsing csv already I'd recommend using a module just from the point of view of saving programmer time and making the code more reliable and maintainable - "efficiency" is generally completely irrelevant! True laziness is hard work	[reply] [d/l]
Re^3: ideas on how to improve existing code by Tux (Canon) on Aug 29, 2011 at 22:58 UTC
It was nothing personal, but I am on a crusade in trying to prevent people that already use Text::CSV, Text::CSV_XS and/or Text::CSV_PP to use `while (<$fh>) { my @row = $csv->parse ($_); ...` [download] which I was referring to with your "perl parsing method" (or any variation thereof) instead of `while (my $row = $csv->getline ($fh) { ...` [download] Which is not only faster, but immensely safer. Under the hood both `<>` and `$csv->getline` use perl's `getline` function and both respect the (perlio) layers, but the upper method is not able to detect the difference between an embedded `$/` inside quotes or at the end of a line, so it is very open to erroneous behavior. Besides that, CSV's `getline` is more lenient towards trailing carriage returns and/or newlines (unless you explicitely set the `eol` attribute. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^2: ideas on how to improve existing code by trolis (Novice) on Aug 29, 2011 at 18:48 UTC
Tux, thanks a lot for taking your time to explain me the basics!	[reply]
Re: ideas on how to improve existing code by Anonymous Monk on Aug 29, 2011 at 09:02 UTC
It is not clear why you are diddling the data, what you call "format the columns", so why are you? You should do that after parsing into cvs, not before Somehow I have a feeling that this code is not efficient but it works. Thats just gas, don't worry about it Is it worth to improve it or just use it as it is? The code is literally 20 lines, there is nothing to improve The only possible thing is use "modern perl" practices, meaning use 3 argument open, and lexical filehandle, and maybe use autodie because the error message is better `$ perl -Mautodie -le " open my $fh => '<', '23232' " Can't open '23232' for reading: 'No such file or directory' at -e line + 1` [download]	[reply] [d/l]
Re^2: ideas on how to improve existing code by trolis (Novice) on Aug 29, 2011 at 09:31 UTC
Thanks for reply and comments. My initial goal is to set up those files to the default format which gnuplot accepts (space delimited columns of data with a dot as decimal separator). I have many csv files which all look slightly differently so the idea was to make them all "the same". For sure there could be many other ways to complete the task but somehow I have chosen Perl. The script should grow over time as I plan to implement more features but I have just started.	[reply]
Re: ideas on how to improve existing code by JavaFan (Canon) on Aug 29, 2011 at 11:27 UTC
Is it worth to improve it or just use it as it is? You tell us! Does it work for you? Is it fast enough for you? If they answer is "yes" to both questions, leave it alone and do something else with your time.	[reply]
Re^2: ideas on how to improve existing code by trolis (Novice) on Aug 29, 2011 at 12:01 UTC
As I said, it works but I was looking let's say for a more proper way to get the same result. Thanks to everybody and especially you GrandFather for suggestions & advices.	[reply]
Re: ideas on how to improve existing code by i5513 (Pilgrim) on Aug 29, 2011 at 13:23 UTC
If csv is simple enough and you know it is provided by a tool which provides csv files correctly, I don't see why using Text::Csv Maybe a simple sustitution work ? `open (CSV, "<File_name.csv") or die $!; while (<CSV>) { next if (1 .. 2); $_ =~ s/\;/ /g; $_ =~ s/(\d),(\d)/\1\.\2/g; print; }` [download]	[reply] [d/l]
Re^2: ideas on how to improve existing code by dsheroh (Monsignor) on Aug 30, 2011 at 09:20 UTC
Note that this breaks on any CSV data which allows quoted strings with embedded commas. `this,line,has,6,"fields, not",7` [download] Just use Text::Csv and you don't have to worry about things like that.	[reply] [d/l]