Interesting. I had similar partial synthetic benchmark yesterday, thought to publish it mainly to advice against my "seek" solution as too slow, then decided not to :), because maybe it's not worth readers' effort.

Nevertheless, somewhat different results for a 1 million lines file, and fast NVMe SSD storage. Below is the case for returning a hash with chars counts, but it's similar for returning string.

$ perl vert2.pl ok 1 - same results ok 2 - same results ok 3 - same results (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate seek buk substr slurp seek 0.920/s -- -61% -84% -88% buk 2.36/s 157% -- -58% -69% substr 5.66/s 515% 140% -- -26% slurp 7.69/s 736% 226% 36% -- 1..3

use strict; use warnings; use feature 'say'; use String::Random 'random_regex'; use Benchmark 'cmpthese'; use Test::More 'no_plan'; my $fn = 'dna.txt'; my $POS = 10; unless ( -e $fn ) { open my $fh, '>', $fn; print $fh random_regex( '[ACTG]{42}' ), "\n" for 1 .. 1e6; } is_deeply _seek(), _substr(), 'same results'; is_deeply slurp(), _substr(), 'same results'; is_deeply buk(), _substr(), 'same results'; cmpthese( 3, { substr => \&_substr, seek => \&_seek, buk => \&buk, slurp => \&slurp, }); sub slurp { open my $fh, '<', $fn; my $s = do { local $/ = undef; <$fh> }; my $count; $count-> { substr $s, $POS - 1 + 43 * $_, 1 }++ for 0 .. length( $s ) / 43 - 1; return $count } sub buk { open my $fh, '<', $fn; my $buf = chr( 0 ) x 43; my $ref = \substr( $buf, $POS - 1, 1 ); my $count; until ( eof $fh ) { substr( $buf, 0 ) = <$fh>; $count-> { $$ref }++ } return $count } sub _seek { open my $fh, '<', $fn; my $L = length( <$fh> ) - 1; seek $fh, $POS - 1, 0; my $count; until ( eof $fh ) { $count-> { getc $fh }++; seek $fh, $L, 1 } return $count } sub _substr { open my $fh, '<', $fn; my $count; $count-> { substr $_, $POS - 1, 1 }++ while <$fh>; return $count }

$ perl -v This is perl 5, version 26, subversion 0 (v5.26.0) built for x86_64-li +nux-thread-multi (with 1 registered patch, see perl -V for more detail)

In reply to Re^2: Faster and more efficient way to read a file vertically by vr
in thread Faster and more efficient way to read a file vertically by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.