in reply to lightweight CSV parser

Any advice appreciated.

Ok, here goes.



17 my $opentag = $config->{'moredata_opentag'} ? $config->{'mor +edata_opentag'} : ''; 19 my $closetag = $config->{'moredata_closetag'} ? $config->{'m +oredata_closetag'} : ''; # not required 24 $dataname = defined($val->[0]) ? $val->[0] : '' ; #datanam +e can be empty 25 $format = $val->[1] ? ($val->[1]) : $format_cfg ; 27 $dataname = $val ? $val : '';

Those are usually written as:

17 my $opentag = $config->{ moredata_opentag } || ''; 19 my $closetag = $config->{ moredata_closetag } || ''; # not r +equired 24 $dataname = @$val ? $val->[0] : '' ; #dataname can be empt +y 25 $format = $val->[1] || $format_cfg ; 27 $dataname = $val || '';

And:

17 my $opentag = $config->{'moredata_opentag'} ? $config->{'mor +edata_opentag'} : ''; 18 die "MoreData requires an open tag: $!" unless ($opentag); 32 my $substrings_aref = _retrieve_strings($str, $opentag, $clo +setag, $dataname); 33 unless (scalar($substrings_aref)) { # no stringrefs returned 34 die "cannot extract substrings from string: $!"; 35 } 41 my $datasep_cfg = $config->{'moredata_datasep'}; 42 length($datasep_cfg) or die "The data separation string must + be configured: $!"; 43 my $hashsep_cfg = $config->{'moredata_hashsep'} || ''; 44 length($hashsep_cfg) or die "The hash separation string must + be configured: $!";

Would probably be better written as:

17 my $opentag = $config->{ moredata_opentag } or die "MoreData + requires an open tag\n"; 32 my $substrings_aref = _retrieve_strings($str, $opentag, $clo +setag, $dataname) or die "cannot extract substrings from string\n"; 41 my $datasep_cfg = $config->{ moredata_datasep } or die "The +data separation string must be configured\n"; 43 my $hashsep_cfg = $config->{ moredata_hashsep } or die "The +hash separation string must be configured\n";


18 die "MoreData requires an open tag: $!" unless ($opentag); 30 die "No format specified in args or configuration: $!" unles +s $format; 33 unless (scalar($substrings_aref)) { # no stringrefs returned 34 die "cannot extract substrings from string: $!"; 35 } 42 length($datasep_cfg) or die "The data separation string must + be configured: $!"; 44 length($hashsep_cfg) or die "The hash separation string must + be configured: $!"; 52 } else { 53 die "no format : $!"; 54 }

The $! variable only contains useful information directly after the use of a function that accesses the system.    A mathematical, string or boolean expression has no effect on it.

$substrings_aref is a scalar variable and it is used in a boolean context so the use of scalar there is completely superfluous.



72 die "Count of $count is not a pair of values in hash assig +nment of values @array : $!" if ($count != (2 || 0));

Because 2 is a TRUE value the expression is just if $count != 2.    Or did you really want the expression: if $count != 2 || $count != 0?



98 my @substrings = ($content, $datastring); 102 return \@substrings unless ($stringlength); 105 return \@substrings if ($openposition == -1); 121 @substrings = ($content, $datastring); 122 return \@substrings if ($dataname eq '__data__'); # return c +ontent and data as strings 123 return \@substrings if ($dataname eq '__content__'); # retur +n content and data as strings 124 return \@substrings unless length($datastring); # no reason +to process empty datastring 130 @substrings = ($content, ''); 131 return \@substrings; 144 @substrings = ($content, $datastring); 145 return \@substrings;

You don't need an array to return an array reference:

102 return [$content, $datastring] unless ($stringlength); 105 return [$content, $datastring] if ($openposition == -1); 122 return [$content, $datastring] if ($dataname eq '__data__'); + # return content and data as strings 123 return [$content, $datastring] if ($dataname eq '__content__ +'); # return content and data as strings 124 return [$content, $datastring] unless length($datastring); # + no reason to process empty datastring 131 return [$content, '']; 145 return [$content, $datastring];


148 sub _escape { 149 my $separator = shift; 150 my $regex = '\\' . join('\\', split(/''/, $separator) ); 151 return $regex; 152 }

I am not sure what this subroutine is supposed to be doing but maybe you should have a look at the quotemeta function or the "\Q...\E" quotemeta escape sequence.    For example:

64 sub _moredata_hash { 65 my ($datastring, $datasep, $hashsep) = @_; 68 my %hash; 69 my @list = split /\s*\Q$datasep\E\s*/, $datastring; 70 foreach my $item (@list) { 71 my $count = my @array = split(/\s*\Q$hashsep\E\s*/, $item) +; 72 die "Count of $count is not a pair of values in hash assig +nment of values @array : $!" if ($count != (2 || 0)); 73 $hash{$array[0]} = $array[1]; 74 } 75 scalar %hash ? \%hash : undef; 76 }


Replies are listed 'Best First'.
Re^2: lightweight CSV parser
by wrinkles (Pilgrim) on Dec 22, 2011 at 17:25 UTC
    Thanks, that was a huge help and very generous of you. Here's what I take away:

    • Don't use the ternary operator when the value tested is the value to be set. Use the || operator.
    • $! contains system call information.
    • It's often better to create an anonymous array directly.
    • Use the quotemeta operator for interpolating strings into regular expressions.
    Thanks a lot!