in reply to Re: More efficient way to exclude footers
in thread More efficient way to exclude footers

Hello, babysFirstPerl.
There is another way: using regular expressions. If I am correct this program has to go over input only once and it is't slow? -
use strict; use warnings; my $header_lines = $ARGV[0] // 0; my $footer_lines = $ARGV[1] // 0; my $whole_input; # slurp whole file into one scalar variable {local $/ ; $whole_input = <DATA>}; # (this can exceed memory if data is too much) # define what line is in regular expression language: # not newline x (zero or more times) + one newline after my $line_regex = qr/[^\n]*\n/; # treat whole input as string and substitute lines with empty strings: $whole_input =~ s/\A (?:$line_regex){$header_lines} //x; # delete some lines from the beginning $whole_input =~ s/ (?:$line_regex){$footer_lines} \z//x; # delete some lines from the ending print $whole_input; # now it is not whole, and you can parse __DATA__ Header 1 Header 2 Text 1 Text 2 Text 3 Text 4 Text 5 Footer 1 Footer 2 Footer 3
But if the last line of the file ends not with newline, second regex do not match and don't delete anything.

Replies are listed 'Best First'.
Re^3: More efficient way to exclude footers
by AnomalousMonk (Archbishop) on Aug 19, 2015 at 17:51 UTC
    But if the last line of the file ends not with newline, second regex do not match and don't delete anything.

    That can easily be fixed by changing the regex object definition
        my $line_regex = qr/[^\n]*\n/;
    to
        my $line_regex = qr/[^\n]*\n?/;
    (note final  \n has  ? quantifier added). (Tested.)

    But you need to go one step further in the example: show extraction of each remaining line for further processing.

    Update: And see also File::Slurp.


    Give a man a fish:  <%-{-{-{-<

      my $line_regex = qr/[^\n]*\n?/;
      Thanks :) . Hm. And if we have some nonsense input: N and M, with file having less than N + M lines, then this regex deletes all lines. But earlier regex (without question mark) fails to delete too much lines. In that case we can set lower limit to greedy quantifier (add "0,"):
      s/\A (?:$line_regex){0, $header_lines}   //x;

      >>But you need to go one step further in the example: show extraction of each remaining line for further processing.
      chomp $whole_input; parse( $_ ) for split /\n/, $whole_input;
      But now it takes time for split :/
      Upd.: these lines after split are without newlines. If that is important for parsing to have newlines, the first split parameter could be /^/m