Re: More efficient way to exclude footers

Hello babysFirstPerl, and welcome to the Monastery!

Am I correct in thinking that the file begins with a single header of $ARGV[0] lines and ends with a single footer of $ARGV[1] lines? If so, the following approach should do what you want. It reads the file exactly once, and processes it on-the-fly so that the number of lines held in memory never exceeds one plus the number of lines in the footer:

#! perl
use strict;
use warnings;

my $header_lines = $ARGV[0] // 0;
my $footer_lines = $ARGV[1] // 0;

<DATA> for 1 .. $header_lines;   # Throw away the header

my @lines;

while (<DATA>)
{
    push @lines, parse_line($_);
    print shift @lines if @lines > $footer_lines;
}

sub parse_line
{
    my ($line) = @_;
    # ...Parse $line...
    return $line;
}

__DATA__
Header 1
Header 2
Text 1
Text 2
Text 3
Text 4
Text 5
Footer 1
Footer 2
Footer 3
[download]

Output:

 0:50 >perl 1348_SoPW.pl 2 3
Text 1
Text 2
Text 3
Text 4
Text 5

 0:55 >
[download]

Update: A couple of additional points:

This test: if ($. != $numHeaders && ... should be if ($. > $numHeaders && ....
If you have to read a file more than once, you don’t have to close and re-open it: just use seek.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Comment on Re: More efficient way to exclude footers Select or Download Code

Replies are listed 'Best First'.
Re^2: More efficient way to exclude footers by AnomalousMonk (Archbishop) on Aug 19, 2015 at 17:25 UTC
A couple of small points, so small, in fact, that I hesitate to mention them... Ah, what the heck... The command-line parameter capture statements of the form `my $header_lines = $ARGV[0] // 0;` could be written `my $header_lines = $ARGV[0] \|\| 0;` (logical-or `\|\|` instead of `//` defined-or) to make the statements Perl version-agnostic (defined-or not introduced until version 5.10). All the rest of the code seems to require nothing more than version 5.0.0. (Tested under 5.8.9.) The `while`-loop line processing code `push @lines, parse_line($_);` `print shift @lines if @lines > $footer_lines;` could be written `push @lines, $_;` `print parse_line(shift @lines) if @lines > $footer_lines;` to avoid parsing footer lines (although they still would be read). I have to admit that with only a dozen footer lines to deal with, it's hard to imagine this would make any detectable difference, but if line parsing is extremely expensive... Who knows? (This change also tested.) Anyway, my two cents, maybe I'll squeeze some XP outta it. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: More efficient way to exclude footers by rsFalse (Chaplain) on Aug 19, 2015 at 17:31 UTC
Hello, babysFirstPerl. There is another way: using regular expressions. If I am correct this program has to go over input only once and it is't slow? - use strict; use warnings; my $header_lines = $ARGV[0] // 0; my $footer_lines = $ARGV[1] // 0; my $whole_input; # slurp whole file into one scalar variable {local $/ ; $whole_input = <DATA>}; # (this can exceed memory if data is too much) # define what line is in regular expression language: # not newline x (zero or more times) + one newline after my $line_regex = qr/[^\n]*\n/; # treat whole input as string and substitute lines with empty strings: $whole_input =~ s/\A (?:$line_regex){$header_lines} //x; # delete some lines from the beginning $whole_input =~ s/ (?:$line_regex){$footer_lines} \z//x; # delete some lines from the ending print $whole_input; # now it is not whole, and you can parse __DATA__ Header 1 Header 2 Text 1 Text 2 Text 3 Text 4 Text 5 Footer 1 Footer 2 Footer 3 [download] But if the last line of the file ends not with newline, second regex do not match and don't delete anything.	[reply] [d/l]
Re^3: More efficient way to exclude footers by AnomalousMonk (Archbishop) on Aug 19, 2015 at 17:51 UTC
But if the last line of the file ends not with newline, second regex do not match and don't delete anything. That can easily be fixed by changing the regex object definition `my $line_regex = qr/[^\n]\n/;` to `my $line_regex = qr/[^\n]\n?/;` (note final `\n` has `?` quantifier added). (Tested.) But you need to go one step further in the example: show extraction of each remaining line for further processing. Update: And see also File::Slurp. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: More efficient way to exclude footers by rsFalse (Chaplain) on Aug 19, 2015 at 18:09 UTC
`my $line_regex = qr/[^\n]\n?/;` Thanks :) . Hm. And if we have some nonsense input: N and M, with file having less than N + M lines, then this regex deletes all lines. But earlier regex (without question mark) fails to delete too much lines. In that case we can set lower limit to greedy quantifier (add "0,"): `s/\A (?:$line_regex){0, $header_lines} //x;` >>But you need to go one step further in the example: show extraction of each remaining line for further processing.* `chomp $whole_input; parse( $_ ) for split /\n/, $whole_input;` [download] But now it takes time for split :/ Upd.: these lines after split are without newlines. If that is important for parsing to have newlines, the first split parameter could be `/^/m`	[reply] [d/l] [select]