in reply to Re: Count byte/character occurrence (quickly)
in thread Count byte/character occurrence (quickly)

The following are parallel demonstrations using MCE::Hobo and threads.

MCE::Hobo and MCE::Shared

A Hobo is a migratory worker inside the machine that carries the asynchronous gene. Hobos are equipped with threads-like capability for running code asynchronously. Unlike threads, each hobo is a unique process to the underlying OS. The IPC is managed by MCE::Shared, which runs on all the major platforms including Cygwin.

use strict; use warnings; use MCE::Hobo; use MCE::Shared; use Time::HiRes qw[ time ]; my $start = time; my $fh = MCE::Shared->handle( "<:raw", $ARGV[ 0 ] ); my $seen = MCE::Shared->array; sub task { my @_seen; while( read( $fh, my $buf, 16384 * 4 ) ) { # the length check may be omitted with MCE::Shared 1.002+ last unless length($buf); ++$_seen[$_] for unpack 'C*', $buf; } for ( 0 .. 255 ) { $seen->incrby($_, $_seen[$_]) if $_seen[$_]; } } MCE::Hobo->create('task') for 1 .. 8; # do other stuff if desired $_->join for MCE::Hobo->list; close $fh; printf "Took %f secs\n", time() - $start; # export and destroy the shared array into a local non-shared array $seen = $seen->destroy; # for ( 0 .. 255 ) { # printf "%c : %u\n", $_, $seen->[$_] if $seen->[$_]; # }

threads and MCE::Shared

The code for MCE::Hobo and threads are very similar.

use strict; use warnings; use threads; use MCE::Shared; use Time::HiRes qw[ time ]; my $start = time; my $fh = MCE::Shared->handle( "<:raw", $ARGV[ 0 ] ); my $seen = MCE::Shared->array; sub task { my @_seen; while( read( $fh, my $buf, 16384 * 4 ) ) { # the length check may be omitted with MCE::Shared 1.002+ last unless length($buf); ++$_seen[$_] for unpack 'C*', $buf; } for ( 0 .. 255 ) { $seen->incrby($_, $_seen[$_]) if $_seen[$_]; } } threads->create('task') for 1 .. 8; # do other stuff if desired $_->join for threads->list; close $fh; printf "Took %f secs\n", time() - $start; # export and destroy the shared array into a local non-shared array $seen = $seen->destroy; # for ( 0 .. 255 ) { # printf "%c : %u\n", $_, $seen->[$_] if $seen->[$_]; # }

Replies are listed 'Best First'.
Re^3: Count byte/character occurrence (quickly)
by james28909 (Deacon) on Apr 01, 2016 at 16:26 UTC

    You guys are awesome. Thanks for these good examples :) When I run this code, it calculates byte occurrence in .9 secs or less!

    Edit: I do have a few questions as well. I dont have time to ask right now, but I will be back!
      Just noticed that you were the author of that module haha. anyways, I have read a file into a buffer and then opened it like:
      my $arg = shift; my $len = -s $arg; open my $file, '<', $arg; binmode $file; read $file, my $buf, $len; close $file; open my $mem_file, '<', \$buf; binmode $mem_file; .....do stuff....
      when I try to use mce_open with $mem_file, I get an error:
      open error: Invalid argument at C:/Perl/site/lib/MCE/Shared/Server.pm +line 1035 thread 1, <__ANONIO__> line 6. MCE::Shared::Server::__ANON__() called at C:/Perl/site/lib/MCE +/Shared/Server.pm line 1324 thread 1 MCE::Shared::Server::_loop(0, 6624) called at C:/Perl/site/lib +/MCE/Shared/Server.pm line 335 thread 1 eval {...} called at C:/Perl/site/lib/MCE/Shared/Server.pm lin +e 335 thread 1

      Is there anyway I can get this to work? Because I have passed around this $mem_file in my script and would like to use it instead of having to re-read the actual file. If i need to elaborate any more please let me know :)

      EDIT: I will go ahead and elaborate a little more. when I pass $mem_file to the sub and try to open it like this:

      stat_check($mem_file); sub stat_check{ my ($mem_file) = @_; my $fh = MCE::Shared->handle( "<:raw", \$mem_file ); ....rest of threaded function... }

      I get error:

      Not a GLOB reference at C:/Perl/site/lib/MCE/Shared/Server.pm line 203 +6, <__ANONIO__> line 3.

      If i try:

      stat_check($mem_file); sub stat_check{ my ($mem_file) = @_; my $fh = MCE::Shared->handle( "<:raw", $mem_file ); ....rest of threaded function... }

      I get error:

      open error: Invalid argument at C:/Perl/site/lib/MCE/Shared/Server.pm +line 1035 thread 1, <__ANONIO__> line 6. MCE::Shared::Server::__ANON__() called at C:/Perl/site/lib/MCE +/Shared/Server.pm line 1324 thread 1 MCE::Shared::Server::_loop(0, 3232) called at C:/Perl/site/lib +/MCE/Shared/Server.pm line 335 thread 1 eval {...} called at C:/Perl/site/lib/MCE/Shared/Server.pm lin +e 335 thread 1

        Unfortunately, MCE::Shared does not support a handle obtained by opening a scalar reference. Shared file handles have a real file descriptor.