in reply to More efficient way to exclude footers

Hello babysFirstPerl, and welcome to the Monastery!

Am I correct in thinking that the file begins with a single header of $ARGV[0] lines and ends with a single footer of $ARGV[1] lines? If so, the following approach should do what you want. It reads the file exactly once, and processes it on-the-fly so that the number of lines held in memory never exceeds one plus the number of lines in the footer:

#! perl use strict; use warnings; my $header_lines = $ARGV[0] // 0; my $footer_lines = $ARGV[1] // 0; <DATA> for 1 .. $header_lines; # Throw away the header my @lines; while (<DATA>) { push @lines, parse_line($_); print shift @lines if @lines > $footer_lines; } sub parse_line { my ($line) = @_; # ...Parse $line... return $line; } __DATA__ Header 1 Header 2 Text 1 Text 2 Text 3 Text 4 Text 5 Footer 1 Footer 2 Footer 3

Output:

0:50 >perl 1348_SoPW.pl 2 3 Text 1 Text 2 Text 3 Text 4 Text 5 0:55 >

Update: A couple of additional points:

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: More efficient way to exclude footers
by AnomalousMonk (Archbishop) on Aug 19, 2015 at 17:25 UTC

    A couple of small points, so small, in fact, that I hesitate to mention them... Ah, what the heck...

    • The command-line parameter capture statements of the form
          my $header_lines = $ARGV[0] // 0;
      could be written
          my $header_lines = $ARGV[0] || 0;
      (logical-or  || instead of  // defined-or) to make the statements Perl version-agnostic (defined-or not introduced until version 5.10). All the rest of the code seems to require nothing more than version 5.0.0. (Tested under 5.8.9.)
    • The while-loop line processing code
          push @lines, parse_line($_);
          print shift @lines if @lines > $footer_lines;
      could be written
          push @lines, $_;
          print parse_line(shift @lines) if @lines > $footer_lines;
      to avoid parsing footer lines (although they still would be read). I have to admit that with only a dozen footer lines to deal with, it's hard to imagine this would make any detectable difference, but if line parsing is extremely expensive... Who knows? (This change also tested.)
    Anyway, my two cents, maybe I'll squeeze some XP outta it.


    Give a man a fish:  <%-{-{-{-<

Re^2: More efficient way to exclude footers
by rsFalse (Chaplain) on Aug 19, 2015 at 17:31 UTC
    Hello, babysFirstPerl.
    There is another way: using regular expressions. If I am correct this program has to go over input only once and it is't slow? -
    use strict; use warnings; my $header_lines = $ARGV[0] // 0; my $footer_lines = $ARGV[1] // 0; my $whole_input; # slurp whole file into one scalar variable {local $/ ; $whole_input = <DATA>}; # (this can exceed memory if data is too much) # define what line is in regular expression language: # not newline x (zero or more times) + one newline after my $line_regex = qr/[^\n]*\n/; # treat whole input as string and substitute lines with empty strings: $whole_input =~ s/\A (?:$line_regex){$header_lines} //x; # delete some lines from the beginning $whole_input =~ s/ (?:$line_regex){$footer_lines} \z//x; # delete some lines from the ending print $whole_input; # now it is not whole, and you can parse __DATA__ Header 1 Header 2 Text 1 Text 2 Text 3 Text 4 Text 5 Footer 1 Footer 2 Footer 3
    But if the last line of the file ends not with newline, second regex do not match and don't delete anything.
      But if the last line of the file ends not with newline, second regex do not match and don't delete anything.

      That can easily be fixed by changing the regex object definition
          my $line_regex = qr/[^\n]*\n/;
      to
          my $line_regex = qr/[^\n]*\n?/;
      (note final  \n has  ? quantifier added). (Tested.)

      But you need to go one step further in the example: show extraction of each remaining line for further processing.

      Update: And see also File::Slurp.


      Give a man a fish:  <%-{-{-{-<

        my $line_regex = qr/[^\n]*\n?/;
        Thanks :) . Hm. And if we have some nonsense input: N and M, with file having less than N + M lines, then this regex deletes all lines. But earlier regex (without question mark) fails to delete too much lines. In that case we can set lower limit to greedy quantifier (add "0,"):
        s/\A (?:$line_regex){0, $header_lines}   //x;

        >>But you need to go one step further in the example: show extraction of each remaining line for further processing.
        chomp $whole_input; parse( $_ ) for split /\n/, $whole_input;
        But now it takes time for split :/
        Upd.: these lines after split are without newlines. If that is important for parsing to have newlines, the first split parameter could be /^/m