Update: Am providing updated results due to background processes running previously. I rebooted my laptop and realized that things were running faster. That met having to re-run all the tests. Included are results for the upcoming MCE 1.706 release with faster IO ( applies to use_slurpio => 1 ). Previously, was unable to run below 3.0 seconds on the Mac with MCE 1.705. The run time is 2.2 seconds with MCE 1.706, which is close to the underlying hardware limit. MCE 1.706 will be released soon.
I ran the same tests from a Linux VM via Parallels Desktop with the 2 GB plain text file residing on a virtual disk inside Fedora 22. Unlike on OS X, the binary grep command runs much faster under Linux.
## FS cache purged inside Linux and on Mac OS X before running.
wc -l : 1.732 secs. from virtual disk
grep -c : 1.912 secs. from virtual disk
total : 3.644 secs.
wc -l : 1.732 secs. from virtual disk
grep -c : 0.884 secs. from FS cache
total : 2.616 secs.
Perl script : 3.910 secs. non-MCE using 1 core
MCE 1.705 MCE 1.706
with MCE : 4.357 secs. 4.015 secs. using 1 core
with MCE : 3.228 secs. 2.979 secs. using 2 cores
with MCE : 2.884 secs. 2.624 secs. using 3 cores
with MCE : 2.908 secs. 2.501 secs. using 4 cores
## Dictionary2GB.txt residing inside FS cache on Linux.
wc -l : 1.035 secs.
grep -c : 0.866 secs.
total : 1.901 secs.
Perl script : 2.314 secs. non-MCE using 1 core
MCE 1.705 MCE 1.706
with MCE : 2.344 secs. 2.337 secs. using 1 core
with MCE : 1.349 secs. 1.345 secs. using 2 cores
with MCE : 0.961 secs. 0.932 secs. using 3 cores
with MCE : 0.820 secs. 0.775 secs. using 4 cores
On Linux, it takes at least 3 workers to run as fast as wc and grep combined with grep reading from FS cache.
Below, the serial code and MCE code respectively.
use strict;
use warnings;
my $size = 24 * 1024 * 1024;
my ( $numlines, $occurances ) = ( 0, 0 );
open my $fh, '<', '/home/mario/Dictionary2GB.txt' or die "$!";
while ( read( $fh, my $b, $size ) ) {
$b .= <$fh> unless ( eof $fh );
$numlines += $b =~ tr/\n//;
$occurances += () = $b =~ /123456\r?$/mg;
}
close $fh;
print "Num lines : $numlines\n";
print "Occurances: $occurances\n";
Using MCE for running on multiple cores.
use strict;
use warnings;
use MCE::Flow;
use MCE::Shared;
my $counter1 = MCE::Shared->scalar( 0 );
my $counter2 = MCE::Shared->scalar( 0 );
mce_flow_f {
chunk_size => '24m', max_workers => 4,
use_slurpio => 1,
},
sub {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
my $numlines = $$chunk_ref =~ tr/\n//;
my $occurances = () = $$chunk_ref =~ /123456\r?$/mg;
$counter1->incrby( $numlines );
$counter2->incrby( $occurances );
}, "/home/mario/Dictionary2GB.txt";
print "Num lines : ", $counter1->get(), "\n";
print "Occurances: ", $counter2->get(), "\n";
Kind regards, Mario. |