MCE_Loop might also be an option:
#!/usr/bin/env perl
use warnings;
use strict;
use MCE::Loop;
use feature qw(say);
my $cpus = MCE::Util->get_ncpu() || 4;
MCE::Loop::init { max_workers => $cpus, };
my %barcode_hash = (
1 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0
+],
2 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0
+],
3 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_BBBB', 0
+],
4 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0
+],
);
my $barcode_pair_35 = 'TTTT_AAAA';
mce_loop {
my ( $mce, $chunk_ref, $chunk_id) = @_;
for (@$chunk_ref) {
if ( $barcode_hash{$_}[2] eq $barcode_pair_35 ) {
say qq(Found $barcode_hash{$_}[2] at $_);
}
}
} keys %barcode_hash;
__END__
Update:
Ok, some benchmarking.
Playing around with $size might be worth the effort. Your mileage may vary. I hope i quoted haukex right and jumped to the right conclusions.
#!/usr/bin/env perl
use MCE::Loop;
use Benchmark qw ( :hireswallclock cmpthese timethese );
use strict;
use warnings;
use feature qw(say);
my $size = 10000;
say $size;
my $cpus = MCE::Util->get_ncpu() || 4;
MCE::Loop::init { max_workers => $cpus, chunk_size => $size };
my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC',
+0 ];
our %barcode_hash = map { $_ => $data } 1 .. 99998;
$barcode_hash{99999} =
[ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA ', 0 ];
$barcode_hash{100000} =
[ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA ', 0 ];
our $barcode_pair_35 = 'TTTT_AAAA';
my $results = timethese(
-10,
{
'karl' => 'karl',
'haukex' => 'haukex',
}
);
cmpthese($results);
sub haukex {
our %barcode_hash;
our $barcode_pair_35;
for my $key ( sort keys %barcode_hash ) {
1 if $barcode_hash{$key}[2] eq $barcode_pair_35;
}
}
sub karl {
our %barcode_hash;
our $barcode_pair_35;
mce_loop {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
for (@$chunk_ref) {
1 if ( $barcode_hash{$_}[2] eq $barcode_pair_35 );
}
}
keys %barcode_hash;
}
__END__
haukex 6.74/s -- -47%
karl 12.8/s 90% --
Update 2: Shit! If i omit the sort i lose...
Update 3: Slightly different picture with 1.000.000 keys and calculating them before benchmarking:
my $size = 10000;
my $cpus = MCE::Util->get_ncpu() || 4;
MCE::Loop::init { max_workers => $cpus, chunk_size => $size };
our $max = scalar keys %barcode_hash;
sub haukex {
our %barcode_hash;
our $barcode_pair_35;
our $max;
for ( 1.. $max ) {
1 if $barcode_hash{$_}[2] eq $barcode_pair_35;
}
}
sub karl {
our %barcode_hash;
our $barcode_pair_35;
our $max;
mce_loop {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
for (@$chunk_ref) {
1 if ( $barcode_hash{$_}[2] eq $barcode_pair_35 );
}
}
1..$max;
}
haukex 2.29/s -- -38%
karl 3.67/s 60% --
Update 4: It's worth to install Sereal::Decoder.
Regards, Karl
«The Crux of the Biscuit is the Apostrophe»
Furthermore I consider that Donald Trump must be impeached as soon as possible
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.