skip junk lines in csv before header

karlberry has asked for the wisdom of the Perl Monks concerning the following question:

I have a csv (actually tsv, but I don't think that matters) with junk lines before the header. I want to parse it with Text::CSV. The file looks like:

first junk line then a blank line

   another junk line
fieldname1,fieldname2
value1,value2
[download]

It is trivial to skip ahead to the /fieldname/ line, since I know what the first field name is. What I cannot figure out is how to manage the filehandle so that getline_hr_all will then read the rest of the file. Instead, it seemingly reads nothing. My script:

use strict; use warnings;
use Text::CSV;

my $csv = Text::CSV->new ({
            sep_char => ",",
            eol => "\n",
            binary => 1,
            auto_diag => 2,
          }) || die ("$0: new CSV failed: " . Text::CSV->error_diag ()
+);
my $filename = "/tmp/junk.csv";

# open file.
open (my $fh, "<", $filename) || die "CSV open($filename) failed: $!";
my $header_line;
while (<$fh>) {
  next unless /^fieldname/; # skip to header line
  $header_line = $_;
}

# parse that line and set our column names.
my $status = $csv->parse ($header_line);
if (! $status) {
  die "failed to parse header line from $filename: $header_line";
}
my @columns = $csv->fields (); warn "got columns: @columns\n";
$csv->column_names (\@columns);

# make a hash of each line, save in list.
while (my $ref = $csv->getline_hr ($fh)) {
  warn "got ref: $ref\n";
}

#my $ret = $csv->getline_hr_all ($fh);
#warn "ret from _all: @$ret\n";

close ($fh) || die "CSV close($filename) failed: $!";
[download]

Apparently gets the field names ok:

got columns: fieldname1 fieldname2
[download]

but there is no more output. Not surprisingly, getline_hr_all (the commented-out lines at the end) merely returns the empty list. I'm guessing my reading from the raw filehandle to skip the initial lines is interfering with Text::CSV's reading. Or is it something else entirely? I've perused the pod for Text::CSV for quite a while, as well as doing general web searches, but could not find the answer. Any help appreciated. Thanks.

Comment on skip junk lines in csv before header Select or Download Code

Replies are listed 'Best First'.
Re: skip junk lines in csv before header by Marshall (Canon) on Jul 24, 2022 at 17:36 UTC
This code as written consumes the entire input file: Stop reading more lines from the file once you have found the header line. `my $header_line; while (<$fh>) { next unless /^fieldname/; # skip to header line $header_line = $_; last; ####### ADDD THIS ##### }` [download]	[reply] [d/l]
Re^2: skip junk lines in csv before header by karlberry (Sexton) on Jul 24, 2022 at 17:48 UTC
Thank you very much. I'm sorry for my stupidity in not seeing that!	[reply]
Re: skip junk lines in csv before header by tybalt89 (Monsignor) on Jul 24, 2022 at 22:31 UTC
Alternate solution. Let perl do the work for you. `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11145686 use warnings; open my $fh, '<', \<<END or die; # FIXME for testing first junk line then a blank line another junk line fieldname1,fieldname2 value1,value2 more,body and,still more,body END do { local $/ = "fieldname"; <$fh> }; # read through "fieldname" my $header_line = "fieldname" . <$fh>; # complete the line print "header: $header_line"; while( <$fh> ) # FIXME for testing { print " body: $_"; }` [download] Outputs: `header: fieldname1,fieldname2 body: value1,value2 body: more,body body: and,still body: more,body` [download]	[reply] [d/l] [select]
Re^2: skip junk lines in csv before header by GotToBTru (Prior) on Jul 28, 2022 at 12:19 UTC
Can you explain how line 17 works? I see `<$fh>` only reads up to the end of line each time. What causes the do statement to complete? But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)	[reply] [d/l]
Re^3: skip junk lines in csv before header (updated) by AnomalousMonk (Archbishop) on Jul 28, 2022 at 12:53 UTC
`do { local $/ = "fieldname"; <$fh> }; # read through "fieldname"` [download] The `local $/ = "fieldname";` statement sets the `$/` input record separator special variable (see perlvar) to the literal `'fieldname'` string. ~~This sets "paragraph" read mode: (no: see Update below)~~ \| This causes `<$fh>` to read the input stream from the beginning of the file (in this particular case) until the end of the first point at which the `'fieldname'` string is encountered. Since the field names are apparently unambiguously known, this reads (almost) all the way through the first field name. The `$/` variable is assigned local-ly in a `do`-block, so it returns to its previous value (the `"\n"` default in this case) at the end of the block. `my $header_line = "fieldname" . <$fh>; # complete the line` [download] Since we (apparently) know the header line begins with `'fieldname'`, assign `$header_line` this initial value and complete reading the line with another `<$fh>`. This reads through the end of the line because `$/` has restored to its original newline value. Update: This sets "paragraph" read mode: ... No, this is not "paragraph" (sometimes called "paragrep") read mode, it is normal read mode. See `$/` in perlvar for a discussion of paragraph mode. In normal read mode, a file is read until just after the sequence of one or more characters in the `$/` special variable is encountered (and including that sequence), or until the end of file if the `$/` sequence is never encountered. Usually, `$/` is a single `"\n"` (newline) character, but it can be any non-empty string. Whatever non-zero-length sequence of characters it may be, this is normal read mode. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: skip junk lines in csv before header by LanX (Saint) on Jul 28, 2022 at 13:16 UTC
The `do` is actually redundant, he can use a basic block for localizing the `$INPUT_RECORD_SEPARATOR` aka `$/` `use v5.12; use warnings; use Data::Dump qw/pp dd/; say pp $/; # show default say my $x = "HEADER\n" x 3 . "fieldname: BLA BLA\n" . join $/, 1..5; open my $fh, '<', \$x; # ignore anything prior to "fieldname" { local $/ = "fieldname:"; <$fh> }; say pp $/; # back to default say "-" x 10; say "fieldname:" . <$fh>; # till end of line` [download] `"\n" HEADER HEADER HEADER fieldname: BLA BLA 1 2 3 4 5 "\n" ---------- fieldname: BLA BLA` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^4: skip junk lines in csv before header by AnomalousMonk (Archbishop) on Jul 28, 2022 at 15:28 UTC
Re^5: skip junk lines in csv before header by LanX (Saint) on Jul 28, 2022 at 15:45 UTC