comment on

Interesting. I had similar partial synthetic benchmark yesterday, thought to publish it mainly to advice against my "seek" solution as too slow, then decided not to :), because maybe it's not worth readers' effort.

Nevertheless, somewhat different results for a 1 million lines file, and fast NVMe SSD storage. Below is the case for returning a hash with chars counts, but it's similar for returning string.

$ perl vert2.pl
ok 1 - same results
ok 2 - same results
ok 3 - same results
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
          Rate   seek    buk substr  slurp
seek   0.920/s     --   -61%   -84%   -88%
buk     2.36/s   157%     --   -58%   -69%
substr  5.66/s   515%   140%     --   -26%
slurp   7.69/s   736%   226%    36%     --
1..3
[download]

use strict;
use warnings;
use feature 'say';
use String::Random 'random_regex';
use Benchmark 'cmpthese';
use Test::More 'no_plan';

my $fn  = 'dna.txt';
my $POS = 10;

unless ( -e $fn ) {
    open my $fh, '>', $fn;
    print $fh random_regex( '[ACTG]{42}' ), "\n"
        for 1 .. 1e6;
}

is_deeply _seek(), _substr(), 'same results';
is_deeply slurp(), _substr(), 'same results';
is_deeply buk(),   _substr(), 'same results';

cmpthese( 3, {
    substr => \&_substr,
    seek   => \&_seek,
    buk    => \&buk,
    slurp  => \&slurp,
});


sub slurp {
    open my $fh, '<', $fn;
    my $s = do { local $/ = undef; <$fh> };
    my $count;
    $count-> { substr $s, $POS - 1 + 43 * $_, 1 }++
        for 0 .. length( $s ) / 43 - 1;
    return $count
}

sub buk {
    open my $fh, '<', $fn;
    my $buf = chr( 0 ) x 43;
    my $ref = \substr( $buf, $POS - 1, 1 );
    my $count;
    until ( eof $fh ) {
        substr( $buf, 0 ) = <$fh>;
        $count-> { $$ref }++
    }
    return $count
}

sub _seek {
    open my $fh, '<', $fn;
    my $L = length( <$fh> ) - 1;
    seek $fh, $POS - 1, 0;
    my $count;
    until ( eof $fh ) {
        $count-> { getc $fh }++;
        seek $fh, $L, 1
    }
    return $count
}

sub _substr {
    open my $fh, '<', $fn;
    my $count;
    $count-> { substr $_, $POS - 1, 1 }++
        while <$fh>;
    return $count
}
[download]

$ perl -v

This is perl 5, version 26, subversion 0 (v5.26.0) built for x86_64-li
+nux-thread-multi
(with 1 registered patch, see perl -V for more detail)
[download]

In reply to Re^2: Faster and more efficient way to read a file vertically by vr
in thread Faster and more efficient way to read a file vertically by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.