karlberry has asked for the wisdom of the Perl Monks concerning the following question:

I have a csv (actually tsv, but I don't think that matters) with junk lines before the header. I want to parse it with Text::CSV. The file looks like:
first junk line then a blank line another junk line fieldname1,fieldname2 value1,value2
It is trivial to skip ahead to the /fieldname/ line, since I know what the first field name is. What I cannot figure out is how to manage the filehandle so that getline_hr_all will then read the rest of the file. Instead, it seemingly reads nothing. My script:
use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new ({ sep_char => ",", eol => "\n", binary => 1, auto_diag => 2, }) || die ("$0: new CSV failed: " . Text::CSV->error_diag () +); my $filename = "/tmp/junk.csv"; # open file. open (my $fh, "<", $filename) || die "CSV open($filename) failed: $!"; my $header_line; while (<$fh>) { next unless /^fieldname/; # skip to header line $header_line = $_; } # parse that line and set our column names. my $status = $csv->parse ($header_line); if (! $status) { die "failed to parse header line from $filename: $header_line"; } my @columns = $csv->fields (); warn "got columns: @columns\n"; $csv->column_names (\@columns); # make a hash of each line, save in list. while (my $ref = $csv->getline_hr ($fh)) { warn "got ref: $ref\n"; } #my $ret = $csv->getline_hr_all ($fh); #warn "ret from _all: @$ret\n"; close ($fh) || die "CSV close($filename) failed: $!";
Apparently gets the field names ok:
got columns: fieldname1 fieldname2
but there is no more output. Not surprisingly, getline_hr_all (the commented-out lines at the end) merely returns the empty list. I'm guessing my reading from the raw filehandle to skip the initial lines is interfering with Text::CSV's reading. Or is it something else entirely? I've perused the pod for Text::CSV for quite a while, as well as doing general web searches, but could not find the answer. Any help appreciated. Thanks.

Replies are listed 'Best First'.
Re: skip junk lines in csv before header
by Marshall (Canon) on Jul 24, 2022 at 17:36 UTC
    This code as written consumes the entire input file:
    Stop reading more lines from the file once you have found the header line.
    my $header_line; while (<$fh>) { next unless /^fieldname/; # skip to header line $header_line = $_; last; ####### ADDD THIS ##### }
      Thank you very much. I'm sorry for my stupidity in not seeing that!
Re: skip junk lines in csv before header
by tybalt89 (Monsignor) on Jul 24, 2022 at 22:31 UTC

    Alternate solution. Let perl do the work for you.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11145686 use warnings; open my $fh, '<', \<<END or die; # FIXME for testing first junk line then a blank line another junk line fieldname1,fieldname2 value1,value2 more,body and,still more,body END do { local $/ = "fieldname"; <$fh> }; # read through "fieldname" my $header_line = "fieldname" . <$fh>; # complete the line print "header: $header_line"; while( <$fh> ) # FIXME for testing { print " body: $_"; }

    Outputs:

    header: fieldname1,fieldname2 body: value1,value2 body: more,body body: and,still body: more,body
      Can you explain how line 17 works? I see <$fh> only reads up to the end of line each time. What causes the do statement to complete?

      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

        do { local $/ = "fieldname"; <$fh> }; # read through "fieldname"

        The local $/ = "fieldname"; statement sets the $/ input record separator special variable (see perlvar) to the literal 'fieldname' string. This sets "paragraph" read mode: (no: see Update below) | This causes <$fh> to read the input stream from the beginning of the file (in this particular case) until the end of the first point at which the 'fieldname' string is encountered. Since the field names are apparently unambiguously known, this reads (almost) all the way through the first field name. The $/ variable is assigned local-ly in a do-block, so it returns to its previous value (the "\n" default in this case) at the end of the block.

        my $header_line = "fieldname" . <$fh>; # complete the line

        Since we (apparently) know the header line begins with 'fieldname', assign $header_line this initial value and complete reading the line with another <$fh>. This reads through the end of the line because $/ has restored to its original newline value.

        Update:

        This sets "paragraph" read mode: ...
        No, this is not "paragraph" (sometimes called "paragrep") read mode, it is normal read mode. See $/ in perlvar for a discussion of paragraph mode.

        In normal read mode, a file is read until just after the sequence of one or more characters in the $/ special variable is encountered (and including that sequence), or until the end of file if the $/ sequence is never encountered. Usually, $/ is a single "\n" (newline) character, but it can be any non-empty string. Whatever non-zero-length sequence of characters it may be, this is normal read mode.


        Give a man a fish:  <%-{-{-{-<

        The do is actually redundant, he can use a basic block for localizing the $INPUT_RECORD_SEPARATOR aka $/

        use v5.12; use warnings; use Data::Dump qw/pp dd/; say pp $/; # show default say my $x = "HEADER\n" x 3 . "fieldname: BLA BLA\n" . join $/, 1..5; open my $fh, '<', \$x; # ignore anything prior to "fieldname" { local $/ = "fieldname:"; <$fh> }; say pp $/; # back to default say "-" x 10; say "fieldname:" . <$fh>; # till end of line

        "\n" HEADER HEADER HEADER fieldname: BLA BLA 1 2 3 4 5 "\n" ---------- fieldname: BLA BLA

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery