Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: search for particular elements of hash with multiple values

by marioroy (Prior)
on Apr 21, 2017 at 02:01 UTC ( [id://1188501]=note: print w/replies, xml ) Need Help??


in reply to search for particular elements of hash with multiple values

Hello pmpmmpmp and welcome to the monastery.

Update: Updated the demonstration. The karlhash function requires MCE 1.828 to be released soon.

After responding to karlgoethebier's post, realized that I wasn't incrementing the $bc_pair_num field. Running parallel may become important when the operation is complex and requires extra CPU time which is not the case here. For learning reasons, made this demo simply to better understand parallelization for this use case. Thank you. Also credits to several monks whom have responded with elegant solutions, which I borrowed in making this demonstration.

# to run perl demo.pl haukex perl demo.pl karlary perl demo.pl karlseq

Today, enjoyed writing Perl code.

#!/usr/bin/env perl # Re^2: search for particular elements of hash with multiple values # http://www.perlmonks.org/?node_id=1188488 use strict; use warnings; use feature qw ( say ); use MCE::Loop; use constant { _FOWPRIM => 0, _REVPRIM => 1, _BC_PAIR => 2, _BC_PNUM => 3 }; # Note: The karlhash function requires MCE 1.828 to be released soon. sub usage { warn "usage: $0 ( haukex | karlary | karlhash | karlseq )\n\n"; exit 1; } my $func_name = shift || usage(); usage() unless __PACKAGE__->can($func_name); my $max = 1000000; my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ]; our %barcode_hash = map { $_ => $data } 1 .. $max - 2; $barcode_hash{ ($max - 1) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; $barcode_hash{ ($max) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; our $barcode_pair_35 = 'TTTT_AAAA'; $| = 1; { no strict qw ( refs ); my @found = $func_name->(); # increment the bc_pair_num field $barcode_hash{$_}[_BC_PNUM]++ for @found; # display summary and found keys printf "Found %d keys\n", scalar @found; say join ', ', sort { $a <=> $b } @found; } exit(0); sub haukex { # Serial code. my @found; for my $key (1 .. $max) { if ($barcode_hash{$key}[_BC_PAIR] eq $barcode_pair_35) { say "Found at $key"; push @found, $key; } } return @found; } sub karlary { # Workers receive next array chunk. MCE::Loop::init { max_workers => 4 }; my @found = mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ( @$chunk_ref ) { if ($barcode_hash{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key, chunk_id: $chunk_id"); MCE->gather($key); } } } 1 .. $max; MCE::Loop::finish(); return @found; } sub karlhash { # Workers receive next array chunk. MCE::Loop::init { max_workers => 4 }; # Must have MCE 1.828 or later to run this demonstration. # See http://www.perlmonks.org/?node_id=1188593 for a faster # solution using hashref as input by spawning workers early. # That will run better on the Windows platform. my @found = mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ( keys %{ $chunk_ref } ) { if ($chunk_ref->{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key, chunk_id: $chunk_id"); MCE->gather($key); } } } \%barcode_hash; MCE::Loop::finish(); return @found; } sub karlseq { # Workers receive next sequence 'begin' and 'end' boundaries. MCE::Loop::init { max_workers => 4, bounds_only => 1 }; # The bounds_only option applies to sequence which means for # workers to compute the begin and end boundaries only and # not the items in between. The result is lesser overhead. my @found = mce_loop_s { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ($chunk_ref->[0] .. $chunk_ref->[1]) { if ($barcode_hash{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key, chunk_id: $chunk_id"); MCE->gather($key); } } } 1, $max; MCE::Loop::finish(); return @found; }

Regards, Mario.

Replies are listed 'Best First'.
Re^2: search for particular elements of hash with multiple values
by marioroy (Prior) on Apr 21, 2017 at 23:22 UTC

    The upcoming MCE update 1.828 allows a hash_ref as input_data. MCE workers may be spawned early to prevent Perl from making extra copies. I made this for a fellow Monk who pinged me to look at this thread. The OP's case does not require parallelization though. However, impoved MCE if one were to process a big hash later on involving complex operations. MCE 1.828 will be released soon. Among other things, signal handling is improved. Also 14% reduction in memory consumption made possible by loading Symbol, Fcntl, and File::Path on demand.

    #!/usr/bin/env perl # Re^2: search for particular elements of hash with multiple values # http://www.perlmonks.org/?node_id=1188593 use strict; use warnings; use feature qw ( say ); use MCE; use constant { _FOWPRIM => 0, _REVPRIM => 1, _BC_PAIR => 2, _BC_PNUM => 3 }; our $barcode_pair_35 = 'TTTT_AAAA'; # The core MCE API defaults to 1 for chunk_size. my $mce = MCE->new( max_workers => 4, chunk_size => 8000, user_func => sub { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ( keys %{ $chunk_ref } ) { if ($chunk_ref->{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key"); MCE->gather($key); } } } )->spawn; my $max = 1000000; my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ]; our %barcode_hash = map { $_ => $data } 1 .. $max - 2; $barcode_hash{ ($max - 1) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; $barcode_hash{ $max } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; my @found; $mce->process( { gather => \@found }, \%barcode_hash ); $mce->shutdown; # increment bc_pair_num field $barcode_hash{$_}[_BC_PNUM]++ for @found; # display summary and found keys printf "Found %d keys\n", scalar @found; say join ', ', sort { $a <=> $b } @found;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1188501]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-24 09:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found