constantreader has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks,

Does anyone know of a way to get Text::CSV_XS to ignore blank lines. Or at least allow me easily determine the line has no data. I'm using getline_hr and column_names to get a hashref. On a blank line (ie, /^$/), I still get a hashref with all the keys and undefs for values. Ideally, it would be nice if it returned an empty hashref.

The only way I have figured out how to do this is to get the line, regex it, and if it has data, then pass it to parse and then to fields. This seems rather inelegant.

Thanks

Update: I just realized the only way to get a hashref is to use getline_hr. fields only returns an array.

Replies are listed 'Best First'.
Re: Text::CSV_XS and blank lines
by Jim (Curate) on Feb 03, 2011 at 22:18 UTC

    This may be a situation where you're forced to forgo the use of the convenience function, getline_hr, and roll your own code to generate a hash of the field values keyed by field labels. I do this a lot myself because I'm forced to use a version of Text::CSV_XS that predates the introduction of getline_hr.

    my $value_of; $csv->parse(); @value_of{ @ordered_field_labels } = $csv->fields();

    Typically, I get the ordered field labels from the CSV file header.

    my @ordered_field_labels; while (my $csv_record = <>) { $csv->parse($csv_record); my @values = $csv->fields(); if ($INPUT_LINE_NUMBER == 1) { @ordered_field_labels = @values; # ... } # ... }

    It's not inelegant. It's just a little less convenient.

    (The blank lines are outside any quoted strings, right? They're not literal blank lines within fields are they?)

      By "blank lines" I mean a line with nothing but a /n.

      Thanks for the sanity check.

        Your definition of what constitutes a blank line is understood. The question is about the context in which those blank lines occur within the file. Are the blank lines within or without quoted strings? In other words, are the blank lines part of the literal data?

        That's the sanity check.

      This code would break any data that has embedded newlines. You should not rely on perl's diamond operator in CSV parsing.


      Enjoy, Have FUN! H.Merijn
Re: Text::CSV_XS and blank lines
by Tux (Canon) on Feb 04, 2011 at 07:09 UTC

    Would it help if I would add a feature that makes the number of fields parsed in the last line available?

    Technically, an empty line is valid CSV, just the fact that it has no fields doesn't make it illegal. With row fetches, you notice immediately, but with hashref fetches you don't.

    I'd mean something like

    while (my $h = $csv->getline_hr ($fh)) { print "You fetched ", $csv->{field_count}, " fields into your hash +\n", }

    Suggestions for a better name are welcome.


    Enjoy, Have FUN! H.Merijn
      Would it help if I would add a feature that makes the number of fields parsed in the last line available?

      Yes, it would indeed. With a field_count property, one could easily skip truly blank lines in the CSV file.

      CSV_RECORD: while (my $value_of = $csv->getline_hr($csv_fh)) { next CSV_RECORD if $csv->{field_count} == 1; # ... }

      This is more elegant than having to evaluate keys %value_of in a scalar context to determine the number of fields or, in the case of parse(), evaluating @values in a scalar context.

      UPDATE: Erased the confused bit about evaluting keys %value_of in a scalar context, which wouldn't help.

        As the _hr variants are the odd one out in the code, counting fields inside the parser actually was kinda awkward. And the only way to do it reliable - as far as I could see in my first try - was definitely not doing any good to the parsing speed of "regular" parses. All the other parse methods return an araay or an array reference. That means that you can very easily check the length of the array to see how many fields were parsed.

        What I did instead, was this:

        is_missing my $missing = $csv->is_missing ($column_idx); Where $column_idx is the (zero-based) index of the column in th +e last result of "getline_hr". while (my $hr = $csv->getline_hr ($fh)) { $csv->is_missing (0) and next; # This was an empty line } When using "getline_hr" for parsing, it is impossible to tell i +f the fields are "undef" because they where not filled in the CSV str +eam or because they were not read at all, as all the fields defined by "column_names" are set in the hash-ref. If you still need to kn +ow if all fields in each row are provided, you should enable "keep_me +ta_info" so you can check the flags.

        Your constructor would then look somewhat like

        my $csv = Text::CSV_XS->new ({ auto_diag => 1, binary => 1, keep_meta_info => 1, });

        Tell me if that would work for you ... (BTW feel free to pull from here)


        Enjoy, Have FUN! H.Merijn