comment on

The foreach loop is hungry in that it stores all lines in memory. A while loop can avoid that by merely nibbling at the cake until its gone. Without knowing the actual details you face, I assume the speed issue is one of reading the file rather than processing individual lines. That's the key bit others pointed out.

While I do not know what your count_bases() function intends to do, splitting the reading and processing steps may help you. I assume you want to do regex matching or something else that can be done on individual lines.

while (my $line = <>) {
    foreach my $count (20, 30) {
        count_bases($line, $count);
    }
}
[download]

The foreach loop makes it easy to extend the particular processing steps on individual lines (e.g. through a dispatch table with code references). The while loop simply keeps running as long as there is input, not trying to store it all in memory at once.

Code references (a reference to 'code') are very useful for dispatch tables: they allow you to easily parametrize behaviour of your program. You can store them in arrays or hashes and later on loop over those arrays to ensure all (or only specific) actions are taken. Higher Order Perl by Mark Jason Dominus has a chapter that I find quite instructive.

A (particularly useless) example of what I mean is below. Adding more steps is trivial: add more items to @actions (making sure the subs accept the same arguments). You could also use hashes of course and select the code to be executed based on actual input.

#!/usr/bin/perl
use strict;
use warnings;

sub prefix_line {
    my $lineno = shift @_;
    my $line = shift @_;
    
    #            Odd line?     Yes    No
    my $prefix = $lineno % 2 ? q{+} : q{o};
    print qq{$prefix };
}

sub print_line {
    my $lineno = shift @_;
    my $line = shift @_;
    
    # Print our actual line
    print qq{$line};
}

my @actions = (
    \&prefix_line,
    \&print_line,
);

my $lineno = 0;
while (my $line = <>) {
    $lineno++;
    foreach my $action (@actions) {
        $action->($lineno, $line);
    }
}
[download]

In reply to Re^3: Efficiently processing a file by rkrieger
in thread Efficiently processing a file by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.