How To Count Lines In File?

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I just found this in the FAQ:

How do I count the number of lines in a file?

One fairly efficient way is to count newlines in the file. The following program uses a feature of tr///, as documented in perlop. If your text file doesn't end with a newline, then it's not really a proper text file, so this may report one fewer line than you expect.
    $lines = 0;
    open(FILE, $filename) or die "Can't open `$filename': $!";
    while (sysread FILE, $buffer, 4096) {
    $lines += ($buffer =~ tr/\n//);
    }
    close FILE;
[download]

This may well be a stupid question but why might that be preferred to this:

open(X,filename);
while(<X>;){}
$lines = $.;
[download]

--
“Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D

Comment on How To Count Lines In File? Select or Download Code

Replies are listed 'Best First'.

Re: How To Count Lines In File?
by jmcnamara (Monsignor) on Jan 27, 2003 at 00:00 UTC

Here is a benchmark to show the difference. I added wc for comparison:

    $ time wc -l bigfile
      91420 bigfile

    real    0m0.016s
    user    0m0.010s
    sys     0m0.006s


    $ time perl faq.pl bigfile
    91420

    real    0m0.032s
    user    0m0.027s
    sys     0m0.004s

    $ time perl cody.pl bigfile
    91420

    real    0m0.105s
    user    0m0.098s
    sys     0m0.008s
[download]

Here is the way I usually don't do it, this is twice as slow as the slowest method above: perl -le 'print $==()=<>' file

--
John.

Update: Added benchmark.

[reply]
[d/l]
[select]

Re: How To Count Lines In File?
by tomhukins (Curate) on Jan 27, 2003 at 00:13 UTC

Here's an answer to your question, instead of yet another way to count the number of lines in a file. ;-)

The solution mentioned in the FAQ runs much faster. Run the following code:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw(timethese);
my $filename = '/usr/share/dict/words';

timethese(100, {
    'read_block' => sub {
        open(FILE, $filename) or die "Can't open file: $!";
        my $lines = 0;
        while (read FILE, my $buffer, 4096) {
            $lines += ($buffer =~ tr/\n//);
        }
        close FILE;
    },
    'read_line' => sub {
        open(FILE, $filename) or die "Can't open file: $!";
        while (<FILE>) {};
        my $lines = $.;
        close FILE;
    }
});
[download]

So, why does this happen? Well, the read_line approach above must read the file one byte at a time in case it encounters a line ending. The read_block approach reads a block of data from the disk and processes it within the Perl process, not needing to make any operating system calls.

The significance of 4096 is that disk block sizes are usually some multiple of 1024 bytes, so reading complete blocks helps the code run faster than if it were to read partial blocks.

[reply]
[d/l]
[select]

Re: How To Count Lines In File?
by Abigail-II (Bishop) on Jan 27, 2003 at 00:28 UTC

    perl -lpe '}{*_=*.}{' file
[download]

Abigail

Re: Re: How To Count Lines In File?

by John M. Dlugosz (Monsignor) on Jan 27, 2003 at 06:10 UTC

So, the meaning of -p is performed by textually including the loop around the actual code, so beginning with "}" will close the -p's while loop, and the final "{" will ballance the closing brace at the end of the expansion.

I never thought about that. I always just figured that built-in looping construct was done on a syntactic boundary, like:

while (<>) {
   eval $option_e;
   print
   }
[download]

So... that's not documented or to be relied on, right?

—John

Re: Re: Re: How To Count Lines In File?

by PodMaster (Abbot) on Jan 27, 2003 at 08:15 UTC

MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
** The Third rule of perl club is a statement of fact: pod is sexy.

Re: Re: Re: Re: How To Count Lines In File?

by John M. Dlugosz (Monsignor) on Jan 28, 2003 at 02:11 UTC

Re^5: How To Count Lines In File?

by Aristotle (Chancellor) on Jan 28, 2003 at 11:39 UTC

Re: Re: How To Count Lines In File?

by OM_Zen (Scribe) on Jan 27, 2003 at 02:45 UTC

WINDOWS

Re: Re: Re: How To Count Lines In File?

by ryddler (Monk) on Jan 27, 2003 at 03:45 UTC

perl -lpe "}{*_=*.}{" file
[download]

(jeffa) 2Re: How To Count Lines In File?

by jeffa (Bishop) on Jan 27, 2003 at 15:27 UTC

perl -lpe '}{$_=$.' file
[download]

On another similar note, last semester my C++ students had to write a small C++ program that averages numbers from a flatfile (one per line). I wowed them with this little doosie:

perl -lpe '$s+=$_}{$_=$s/$.' file
[download]

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

[reply]
[d/l]
[select]

Re: How To Count Lines In File?

by Abigail-II (Bishop) on Jan 27, 2003 at 21:46 UTC

Is there any gain in using globs over the variables themselves

If you can answer the question "is there any gain in using perl -ple '}{*_=*.}{' file over wc -l file", you can figure out the other question yourself.

Abigail

[reply]
[d/l]
[select]

Re: Re: How To Count Lines In File?

by Anonymous Monk on Jan 27, 2003 at 22:19 UTC

Re: Re: Re: How To Count Lines In File?

by sauoq (Abbot) on Jan 27, 2003 at 23:48 UTC

Re: How To Count Lines In File?
by Aristotle (Chancellor) on Jan 27, 2003 at 02:55 UTC

{
    local ($/, $_) = (\4096);
    $lines += tr/\n// while <$fh>;
}
[download]

Makeshifts last the longest.

Re: How To Count Lines In File?
by John M. Dlugosz (Monsignor) on Jan 27, 2003 at 06:01 UTC

My first Perl program ever was very much your simple way to count lines. Perhaps I didn't even know about $. and incremented a counter in the body of the loop.

The interesting part is: I never got a result. It ran so slowly on my PC (a 16 MHz 80386SX I beleive) that I killed it when it was taking too long to finish.

I was disapointed that it ran so slowly and was not useful. But, when Perl was young and AWK was all the rage, didn't all computers have speeds in that order of magnitude? Perhaps the disk IO was eating it alive due to a poor or immature 32-bit environment that had to trap to real-mode DOS on every file-read call.

So... when Perl was young and the FAQ was being written, perhaps the buffer-at-a-time approach was significantly better performing. The logic of <FILE> to read up to the next newline might have been primitive in the early days, and changed when memory became cheap and buffers were no big deal.

—John

Re: How To Count Lines In File?
by broquaint (Abbot) on Jan 27, 2003 at 12:43 UTC

perl -e 'print @{[<>]}.$/' some_file_here
[download]

_________ broquaint

Re^2: How To Count Lines In File?

by Aristotle (Chancellor) on Jan 27, 2003 at 16:07 UTC

tr

$ perl -lp0777e'$_=tr/\n//' foo
[download]

Makeshifts last the longest.

Re: How To Count Lines In File?
by ibanix (Hermit) on Jan 26, 2003 at 23:16 UTC

my $lines = `wc -l $filename`;
$lines =~ /(d+)\s/; 
$lines = $1;
[download]

$ echo '$0 & $0 &' > foo; chmod a+x foo; foo;

[reply]
[d/l]
[select]

Re^2: How To Count Lines In File?

by Aristotle (Chancellor) on Jan 27, 2003 at 00:06 UTC

wc

$lines

$1

previous

$1

if

Makeshifts last the longest.

Re: How To Count Lines In File?

by Cody Pendant (Prior) on Jan 26, 2003 at 23:59 UTC

--
“Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D

Re: Re: How To Count Lines In File?

by Gilimanjaro (Hermit) on Jan 26, 2003 at 23:24 UTC

my ($lines) = `wc -l $filename` =~ /(\d+)/;
[download]

Re: How To Count Lines In File?
by Anonymous Monk on Jan 27, 2003 at 12:43 UTC

sysread of a fixed block (adjusting the 4096 argument to match the blocksize used by your OS provides an additional optimization, BTW) is more efficient than asking perl to do that in the background and then scan until \n into sub- buffers returning them one at a time. tr just scans once straight thru the whole buffer, and returns the count as a side effect.

Re: How To Count Lines In File?
by Limbic~Region (Chancellor) on Jan 27, 2003 at 18:11 UTC

The only reason I am contributing to what seems like a complete thread is because I ran into a similar dilema that had real world impact. In trying to solve my problem (which is too long to get into here), I decided to check out the Unix Reconstruction Project at Perl Power Tools and found that the tcgrep was blazing fast in comparison to anything I was capable of writing.

I spent several days ripping out lines of code until I found what I was looking for and splicing it into my code with a few more optimizations for my very specific environment and was able to actually beat the compiled Unix grep (albeit in a very specific race).

So - if you are trying to count lines, words, characters, paragraphs, or a few other things - I would suggest checking out this.

UPDATE: PPT's port of wc does not use the ultra streamlined version of counting lines in a file, but it does offer all kinds of other support such as UTF support for word counting, etc - that is why I felt it was worth mentioning!

Cheers - L~R

Re: How To Count Lines In File?
by pg (Canon) on Jan 27, 2003 at 18:00 UTC

Back to Seekers of Perl Wisdom