in reply to CSV headers. Feedback wanted
All feedback weighed, I have now committed and pushed:
This method does NOT work in perl-5.6.x
Parse the CSV header and set sep_char and encoding.
my @hdr = $csv->header ($fh)->column_names; $csv->header ($fh, [ ";", ",", "|", "\t" ]); $csv->header ($fh, { bom => 1, fold => "lc" }); $csv->header ($fh, [ ",", ";" ], { bom => 1, fold => "lc" });
The first argument should be a file handle.
Assuming that the file opened for parsing has a header, and the header does not contain problematic characters like embedded newlines, read the first line from the open handle, auto-detect whether the header separates the column names with a character from the allowed separator list. That list defaults to [ ";", "," ] and can be overruled with an optional argument of an anonymous list of allowed separator sequences. If any of the allowed separators matches, and none of the other allowed separators match, set sep_char to that sequence for the current CSV_XS instance and use it to parse the first line, map those to lowercase, use that to set the instance column_names and return the instance:
my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); open my $fh, "<:encoding(iso-8859-1)", "file.csv"; $csv->header ($fh); while (my $row = $csv->getline_hr ($fh)) { ... }
If the header is empty, contains more than one unique separator out of the allowed set, contains empty fields, or contains identical fields (after folding), it will croak with error 1010, 1011, 1012, or 1013 respectively.
This method will return the instance on success or undefined on failure if it did not croak.
The default behavior is to detect if the header line starts with a BOM. If the header has a BOM, use that to set the encoding of $fh. This default behavior can be disabled by passing a false value to the bom option.
Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, UTF-1, UTF-EBCDIC, SCSU, BOCU-1, and GB-18030. UTF-7 is not supported.
This is Work-In-Progress. currently only UTF-8 is working as expected
The default is to fold the header to lower case. You can also choose to fold the headers to upper case with { fold => "uc" } or to leave the fields as-is with { fold => "none" }.
The default is to set the instances column names using column_names if the method is successful, so subsequent calls to getline_hr can return a hash. Disable setting the header can be forced using a false value for this option like { columns => 0 }.
|
|---|