in reply to testing files for valid content
If that produces only one line of output for a given file, you know the file has the same number of commas in all rows (and it tells you how many commas per row).perl -lne '$n=tr/,//;$h{$n}++;END{print "$h{$_} rows have $_ commas" f +or (keys %h)}' some_file.csv
If some lines have more and others have fewer, you'll get a breakdown of the variance. It still could be a "valid" CSV file, if lines with extra commas happen to have quotes or escapes (meaning that you really need to use a parsing module like Text::xSV).
Apart from that, even if the CSV data is simple (no quoted/escaped commas) and has the same number of commas on every line, you need to be careful with your use of split() -- this would be wrong:
You should do it like this instead:split(",")
If you don't do that, split() will ignore "extra" commas at the end of a line -- e.g. this:split( /,/, $_, -1 );
will fill @array like this:@array = split( /,/, "field1,field2,,field4,field5,,," );
Note how the trailing empty fields are truncated. Please read about split.( 'field1', 'field2', undef, 'field4', 'field5' );
|
|---|