The following provides a parallel version for the slurp routine. I'm not sure why or where to look, running MCE via cmpthese reports inaccurately with MCE being 300x faster which is wrong. So, I needed to benchmark another way.

Regarding MCE, workers receive the next chunk and tally using a local hash. Then, update the shared hash.

use strict; use warnings; use MCE; use MCE::Shared; use String::Random 'random_regex'; use Time::HiRes 'time'; my $fn = 'dna.txt'; my $POS = 10; my $shrcount = MCE::Shared->hash(); my $mce; unless ( -e $fn ) { open my $fh, '>', $fn; print $fh random_regex( '[ACTG]{42}' ), "\n" for 1 .. 1e6; } sub slurp { open my $fh, '<', $fn; my $s = do { local $/ = undef; <$fh> }; my $count; $count-> { substr $s, $POS - 1 + 43 * $_, 1 }++ for 0 .. length( $s ) / 43 - 1; return $count } sub mce { unless ( defined $mce ) { $mce = MCE->new( max_workers => 4, chunk_size => '300k', use_slurpio => 1, user_func => sub { my ( $mce, $slurp_ref, $chunk_id ) = @_; my ( $count, @todo ); $count-> { substr ${ $slurp_ref }, $POS - 1 + 43 * $_, 1 }++ for 0 .. length( ${ $slurp_ref } ) / 43 - 1; # Each key involves one IPC trip to the shared-manager. # # $shrcount->incrby( $_, $count->{$_} ) # for ( keys %{ $count } ); # The following is faster for smaller chunk size. # Basically, send multiple commands at once. # push @todo, [ "incrby", $_, $count->{$_} ] for ( keys %{ $count } ); $shrcount->pipeline( @todo ); } )->spawn(); } $shrcount->clear(); $mce->process($fn); return $shrcount->export(); } for (qw/ slurp mce /) { no strict 'refs'; my $start = time(); my $func = "main::$_"; $func->() for 1 .. 3; printf "%5s: %0.03f secs.\n", $_, time() - $start; } __END__ slurp: 0.487 secs. mce: 0.149 secs.

In reply to Re^3: Faster and more efficient way to read a file vertically by marioroy
in thread Faster and more efficient way to read a file vertically by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.