Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

search for particular elements of hash with multiple values

by pmpmmpmp (Novice)
on Apr 13, 2017 at 19:33 UTC ( [id://1187870]=perlquestion: print w/replies, xml ) Need Help??

pmpmmpmp has asked for the wisdom of the Perl Monks concerning the following question:

I know that if you want to search a hash for a key you can use:

 exists($barcode_hash{$barcode_pair_35})

However, my hash has become more complicated:

 $barcode_hash{$prim_pair_number} = [ $fowprim, $revprim, $bc_pair, $bc_pair_num ];

How would I search through this hash to find a particular

$bc_pair

?

My hash looks like this:

$VAR1 = { '1' => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ], '2' => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]

The thing I am looking for

$barcode_pair_35

is this TTTT_CCCC.

Also, when I find it, I would like to increment

 $bc_pair_num++

Replies are listed 'Best First'.
Re: search for particular elements of hash with multiple values
by haukex (Archbishop) on Apr 13, 2017 at 20:00 UTC

    Either scan your data or build a lookup table (see perlreftut and perldsc), as follows. Which one is better in terms of the speed/memory tradeoff depends on how big the tables are and how many lookups you are performing.

    use warnings; use strict; my %barcode_hash = ( 1 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_CCCC',0], 2 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_AAAA',0], 3 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_BBBB',0], 4 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_AAAA',0], ); my $barcode_pair_35 = 'TTTT_AAAA'; for my $key (sort keys %barcode_hash) { print "Found at $key\n" if $barcode_hash{$key}[2] eq $barcode_pair_35; } # - OR - my %lookup; for my $key (sort keys %barcode_hash) { push @{ $lookup{ $barcode_hash{$key}[2] } }, $key; } print "Looked up at ".join(", ",@{$lookup{$barcode_pair_35}})."\n"; __END__ Found at 2 Found at 4 Looked up at 2, 4

    Updated: Replaced the hashref $data with the hash %barcode_hash to bring the code into line with your examples.

      Thank you very much, haukex. These are very nice. I was wondering why you sort the keys in each example ?

        The sort is not necessary.

        Here is how I would approach it:

        use warnings; use strict; my %barcode_hash = ( 1 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_CCCC',0], 2 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_AAAA',0], 3 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_BBBB',0], 4 => ['AGCTCGTTGTTCGATCCA','GAGAGATAGATGATAGTG','TTTT_AAAA',0], ); Find_and_Increment ('TTTT_AAAA'); Find_and_Increment ('TTTT_CCCC'); Find_and_Increment ('TTTT_AAAA'); #-------------------- sub Find_and_Increment{ my ($pair) = @_; for my $k (keys %barcode_hash){ next unless ( my $aref = $barcode_hash{$k} ) -> [2] eq $pair; $aref->[3]++; print "$pair found at $k, $aref->[3] time(s)\n"; } }
        Output:
        $ perl search_hash.pl TTTT_AAAA found at 4, 1 time(s) TTTT_AAAA found at 2, 1 time(s) TTTT_CCCC found at 1, 1 time(s) TTTT_AAAA found at 4, 2 time(s) TTTT_AAAA found at 2, 2 time(s)

                ...Disinformation is not as good as datinformation.               Don't document the program; program the document.

        I was wondering why you sort the keys in each example ?

        Hashes are unordered, and keys will return the keys in a random order. Try removing the sort from my code, and you'll see that in the second code example, the arrays in the %lookup "hash of arrays" data structure will be in a random order across multiple runs of the program, and the output of both code examples will be in a random order across multiple runs. This may or may not be important in your case, I just did the sort keys so that the output would always be in the same order. This can also make testing easier, since tests wouldn't have to account for the randomness. If your hashes are large then the sort may slow down your program a bit. Personally I would suggest doing sort keys for the consistency, unless you know for a fact that the order doesn't matter in your case (e.g. if all you are doing is counting), or the speed impact becomes a problem.

Re: search for particular elements of hash with multiple values
by kcott (Archbishop) on Apr 14, 2017 at 04:46 UTC

    G'day pmpmmpmp,

    "... if you want to search a hash for a key ... my hash has become more complicated ..."

    Presumably you started with a simple hash; you now have a complex data structure called a Hash of Arrays (HoA). See "perldsc - Perl Data Structures Cookbook" for information on processing these, and other, complex data structures.

    From your description, you're no longer interested in the keys at all, just the values. You can perform your search and increment with this line:

    $_->[2] eq $barcode_pair_35 and ++$_->[3] for values %$VAR1;

    Here's my test code:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; my $VAR1 = { '1' => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', +0 ], '2' => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', +0 ] }; dd $VAR1; my $barcode_pair_35 = 'TTTT_CCCC'; $_->[2] eq $barcode_pair_35 and ++$_->[3] for values %$VAR1; dd $VAR1;

    Output:

    { 1 => ["AGCTCGTTGTTCGATCCA", "GAGAGATAGATGATAGTG", "TTTT_CCCC", 0], 2 => ["AGCTCGTTGTTCGATCCA", "GAGAGATAGATGATAGTG", "TTTT_AAAA", 0], } { 1 => ["AGCTCGTTGTTCGATCCA", "GAGAGATAGATGATAGTG", "TTTT_CCCC", 1], 2 => ["AGCTCGTTGTTCGATCCA", "GAGAGATAGATGATAGTG", "TTTT_AAAA", 0], }

    — Ken

Re: search for particular elements of hash with multiple values
by Anonymous Monk on Apr 13, 2017 at 19:39 UTC
    Use a loop(while) like in real life, that is walk into each room (values %hash), then check into each drawer (  $val->[666] eq ... ) stop looking when something something...
Re: search for particular elements of hash with multiple values
by Anonymous Monk on Apr 14, 2017 at 05:03 UTC
    This is what I have. Hope it makes sense! Since your hash has an array as the value, you just have to tell perl to check what specific thing you want to look at.
    #!/usr/bin/perl use warnings; use strict; #seting up your things #$barcode_hash{$prim_pair_number} = [ $fowprim, $revprim, $bc_pair, $b +c_pair_num ]; my %VAR1 = ( '1' => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ], '2' => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ] ); my $barcode_pair_35 = 'TTTT_CCCC'; #for each key in the hash, if the 3rd element (i.e. index 2) is equal +to $barcode_pair_35, #then incriment the 4th element (i.e. index 3) and print that you foun +d it. foreach my $prim_pair_num (sort keys %VAR1){ if($VAR1{$prim_pair_num}[2] eq $barcode_pair_35){ $VAR1{$prim_pair_num}[3]++;print "found it\n";print $V +AR1{$prim_pair_num}[3]."\n"; }#endof if }

      Wow ! Thank you for all the input NetWallah, haukex, Ken and Anonymous Monk. I have been learning a lot playing around with all these examples and reading some of the perldsc. I will test these some more, do a little more reading and determine what will be best for my code. This is fantastic !

Re: search for particular elements of hash with multiple values
by karlgoethebier (Abbot) on Apr 19, 2017 at 17:58 UTC

    MCE_Loop might also be an option:

    #!/usr/bin/env perl use warnings; use strict; use MCE::Loop; use feature qw(say); my $cpus = MCE::Util->get_ncpu() || 4; MCE::Loop::init { max_workers => $cpus, }; my %barcode_hash = ( 1 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 +], 2 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 +], 3 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_BBBB', 0 +], 4 => [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 +], ); my $barcode_pair_35 = 'TTTT_AAAA'; mce_loop { my ( $mce, $chunk_ref, $chunk_id) = @_; for (@$chunk_ref) { if ( $barcode_hash{$_}[2] eq $barcode_pair_35 ) { say qq(Found $barcode_hash{$_}[2] at $_); } } } keys %barcode_hash; __END__

    Update:

    Ok, some benchmarking.

    Playing around with $size might be worth the effort. Your mileage may vary. I hope i quoted haukex right and jumped to the right conclusions.

    #!/usr/bin/env perl use MCE::Loop; use Benchmark qw ( :hireswallclock cmpthese timethese ); use strict; use warnings; use feature qw(say); my $size = 10000; say $size; my $cpus = MCE::Util->get_ncpu() || 4; MCE::Loop::init { max_workers => $cpus, chunk_size => $size }; my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', +0 ]; our %barcode_hash = map { $_ => $data } 1 .. 99998; $barcode_hash{99999} = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA ', 0 ]; $barcode_hash{100000} = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA ', 0 ]; our $barcode_pair_35 = 'TTTT_AAAA'; my $results = timethese( -10, { 'karl' => 'karl', 'haukex' => 'haukex', } ); cmpthese($results); sub haukex { our %barcode_hash; our $barcode_pair_35; for my $key ( sort keys %barcode_hash ) { 1 if $barcode_hash{$key}[2] eq $barcode_pair_35; } } sub karl { our %barcode_hash; our $barcode_pair_35; mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; for (@$chunk_ref) { 1 if ( $barcode_hash{$_}[2] eq $barcode_pair_35 ); } } keys %barcode_hash; } __END__ haukex 6.74/s -- -47% karl 12.8/s 90% --

    Update 2: Shit! If i omit the sort i lose...

    Update 3: Slightly different picture with 1.000.000 keys and calculating them before benchmarking:

    my $size = 10000; my $cpus = MCE::Util->get_ncpu() || 4; MCE::Loop::init { max_workers => $cpus, chunk_size => $size }; our $max = scalar keys %barcode_hash; sub haukex { our %barcode_hash; our $barcode_pair_35; our $max; for ( 1.. $max ) { 1 if $barcode_hash{$_}[2] eq $barcode_pair_35; } } sub karl { our %barcode_hash; our $barcode_pair_35; our $max; mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; for (@$chunk_ref) { 1 if ( $barcode_hash{$_}[2] eq $barcode_pair_35 ); } } 1..$max; } haukex 2.29/s -- -38% karl 3.67/s 60% --

    Update 4: It's worth to install Sereal::Decoder.

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    Furthermore I consider that Donald Trump must be impeached as soon as possible

      Update: This post was made to showcase another way to benchmark when involving parallel workers. Upon further review, the OP's post wants to increment $bc_pair_num which isn't done here. I will come back later and update the code. Also, sorting is not required when Perl has two fast ordered-hash implementations. I will try Hash::Ordered and MCE::Shared::Ordhash (constructed as non-shared). These modules are fast.

      Update: The OP does not mention sorting. Therefore please disregard the mentioning of the two ordered-hash implementations. Like haukex said, sorting is helpful during debugging sessions. This is true.

      Update: Posted a non-benchmark version, which increments the $bc_pair_num field.

      Greetings,

      Sometimes workers may not stop immediately like you want them to with various benchmark modules. The following is another way, based on karlgoethebier's example. For an array (mce_loop), the manager process chunks and sends the next chunk via IPC. For sequence (mce_loop_s) and with the bounds_only option, workers compute the next offset begin and end boundaries. Thus, runs with lesser overhead.

      If you must set the chunk_size option, do not go over 8000 when processing an array. Perl performance degrades if you go higher. Better yet, simply comment out the chunk_size option or not set it. There's no easy formula for setting chunk_size. However, the default chunk_size => 'auto' for MCE Models do a good job for most cases.

      For arrays, check to see if Perl has Sereal::Decoder and Sereal::Encoder 3.015 or later installed. MCE will use Sereal 3.015+ for serialization if available. Otherwise, it defaults to Storable. The results are mind-boggling. The reason is that for arrays, MCE involves the manager process which chunks the array elements and sends via IPC. Even with that overhead, MCE runs faster. That requires the our vs my keyword on %barcode_hash and $barcode_pair_35. Thank you, karlgoethebier for this enlightenment.

      Results: run 50 times for 100000 keys: $max set to 1e5.

      $ perl demo.pl haukex 50 duration (haukex): 1.236 seconds found match: yes $ perl demo.pl karlary 50 duration (karlary): 0.825 seconds found match: yes $ perl demo.pl karlseq 50 duration (karlseq): 0.313 seconds found match: yes

      Results: run 50 times for 1 million keys: $max set to 1e6.

      $ perl demo.pl haukex 50 duration (haukex): 17.388 seconds found match: yes $ perl demo.pl karlary 50 duration (karlary): 7.633 seconds found match: yes $ perl demo.pl karlseq 50 duration (karlseq): 2.858 seconds found match: yes

      Demo script.

      #!/usr/bin/env perl use strict; use warnings; use feature qw( say ); use MCE::Loop; use Time::HiRes qw( time ); sub usage { warn "usage: $0 ( haukex | karlary | karlseq ) [ count ]\n\n"; exit 1; } my $func = shift || usage(); my $count = shift || 50; usage() unless main->can($func); my $cpus = MCE::Util->get_ncpu() || 4; my $max = 100000; MCE::Loop::init { max_workers => $cpus, chunk_size => 8000, # <-- do not go over 8000 bounds_only => 1 # <-- applies to sequence }; my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ]; our %barcode_hash = map { $_ => $data } 1 .. $max - 2; $barcode_hash{ ($max - 1) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; $barcode_hash{ ($max) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; our $barcode_pair_35 = 'TTTT_AAAA'; { no strict 'refs'; my $start = time; my $ret; $ret = $func->() for 1 .. $count; printf "duration ($func): %0.03f seconds\n", time - $start; printf "found match: %s\n", $ret ? 'yes' : 'no'; } exit 0; sub haukex { # serial code my $ret = 0; for ( 1 .. $max ) { $ret = 1, last if $barcode_hash{$_}[2] eq $barcode_pair_35; } return $ret; } sub karlary { # workers receive next array chunk my @ret = mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; for ( @$chunk_ref ) { MCE->gather(1), MCE->abort(), last if ( $barcode_hash{$_}[2] eq $barcode_pair_35 ); } } 1 .. $max; # <-- for array 1 .. $max return @ret ? 1 : 0; } sub karlseq { # workers receive next sequence 'begin' and 'end' boundaries my @ret = mce_loop_s { my ( $mce, $chunk_ref, $chunk_id ) = @_; for ( $chunk_ref->[0] .. $chunk_ref->[1] ) { MCE->gather(1), MCE->abort(), last if ( $barcode_hash{$_}[2] eq $barcode_pair_35 ); } } 1, $max; # <-- for sequence 1, $max return @ret ? 1 : 0; }

      For the MCE bits, I used MCE->gather and MCE->abort. The abort method is helpful which stops all workers from processing more chunks. Thus, ending the job early.

      Update: Results from a Windows 7 VM configured with 4 cores and Strawberry Perl. I think Perl makes extra copies. Thus, involves extra time during spawning.

      Results: run 50 times for 100000 keys: $max set to 1e5.

      $ perl demo.pl haukex 50 duration (haukex): 1.232 seconds found match: yes $ perl demo.pl karlary 50 duration (karlary): 1.482 seconds found match: yes $ perl demo.pl karlseq 50 duration (karlseq): 0.858 seconds found match: yes

      Results: run 50 times for 1 million keys: $max set to 1e6.

      $ perl demo.pl haukex 50 duration (haukex): 20.108 seconds found match: yes $ perl demo.pl karlary 50 duration (karlary): 16.770 seconds found match: yes $ perl demo.pl karlseq 50 duration (karlseq): 11.419 seconds found match: yes

      Update: Also tested Perl from the Cygwin environment. Here, it seems workers are spawned instantly after the initial creation. This is see via the task manager.

      Results: run 50 times for 100000 keys: $max set to 1e5.

      $ perl demo.pl haukex 50 duration (haukex): 1.607 seconds found match: yes $ perl demo.pl karlary 50 duration (karlary): 1.529 seconds found match: yes $ perl demo.pl karlseq 50 duration (karlseq): 0.749 seconds found match: yes

      Results: run 50 times for 1 million keys: $max set to 1e6.

      $ perl demo.pl haukex 50 duration (haukex): 25.194 seconds found match: yes $ perl demo.pl karlary 50 duration (karlary): 14.446 seconds found match: yes $ perl demo.pl karlseq 50 duration (karlseq): 7.051 seconds found match: yes

      Regards, Mario.

        To display the location, I changed the code accordingly and run 1 time. Running serially seems fast enough. Basically, Perl completes in less than 1 second for 1 million keys from an Intel i7 Haswell chip running at 2.6 GHz. Nethertheless, this post was written to demonstrate workers sending data to the manager-process for STDOUT via MCE->print(...).

        Results: run 1 and 2 times for 1 million keys: $max set to 1e6.

        $ perl demo2.pl haukex 1 Found at 999999 Found at 1000000 duration (haukex): 0.336 seconds $ perl demo2.pl karlary 1 Found at 999999 Found at 1000000 duration (karlary): 0.231 seconds $ perl demo2.pl karlseq 1 Found at 999999 Found at 1000000 duration (karlseq): 0.122 seconds
        $ perl demo2.pl haukex 2 Found at 999999 Found at 1000000 Found at 999999 Found at 1000000 duration (haukex): 0.665 seconds $ perl demo2.pl karlary 2 Found at 999999 Found at 1000000 Found at 999999 Found at 1000000 duration (karlary): 0.477 seconds $ perl demo2.pl karlseq 2 Found at 999999 Found at 1000000 Found at 999999 Found at 1000000 duration (karlseq): 0.253 seconds

        Below, workers report the location.

        ... { no strict 'refs'; my $start = time; $func->() for 1 .. $count; printf "duration ($func): %0.03f seconds\n", time - $start; } ... sub haukex { # serial code for my $key ( 1 .. $max ) { print "Found at $key\n" if ( $barcode_hash{$key}[2] eq $barcode_pair_35 ); } return; } sub karlary { # workers receive next array chunk mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; for my $key ( @$chunk_ref ) { MCE->print("Found at $key\n") if ( $barcode_hash{$key}[2] eq $barcode_pair_35 ); } } 1 .. $max; # <-- for array 1 .. $max return; } sub karlseq { # workers receive next sequence 'begin' and 'end' boundaries mce_loop_s { my ( $mce, $chunk_ref, $chunk_id ) = @_; for my $key ( $chunk_ref->[0] .. $chunk_ref->[1] ) { MCE->print("Found at $key\n") if ( $barcode_hash{$key}[2] eq $barcode_pair_35 ); } } 1, $max; # <-- for sequence 1, $max return; }
Re: search for particular elements of hash with multiple values
by marioroy (Prior) on Apr 21, 2017 at 02:01 UTC

    Hello pmpmmpmp and welcome to the monastery.

    Update: Updated the demonstration. The karlhash function requires MCE 1.828 to be released soon.

    After responding to karlgoethebier's post, realized that I wasn't incrementing the $bc_pair_num field. Running parallel may become important when the operation is complex and requires extra CPU time which is not the case here. For learning reasons, made this demo simply to better understand parallelization for this use case. Thank you. Also credits to several monks whom have responded with elegant solutions, which I borrowed in making this demonstration.

    # to run perl demo.pl haukex perl demo.pl karlary perl demo.pl karlseq

    Today, enjoyed writing Perl code.

    #!/usr/bin/env perl # Re^2: search for particular elements of hash with multiple values # http://www.perlmonks.org/?node_id=1188488 use strict; use warnings; use feature qw ( say ); use MCE::Loop; use constant { _FOWPRIM => 0, _REVPRIM => 1, _BC_PAIR => 2, _BC_PNUM => 3 }; # Note: The karlhash function requires MCE 1.828 to be released soon. sub usage { warn "usage: $0 ( haukex | karlary | karlhash | karlseq )\n\n"; exit 1; } my $func_name = shift || usage(); usage() unless __PACKAGE__->can($func_name); my $max = 1000000; my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ]; our %barcode_hash = map { $_ => $data } 1 .. $max - 2; $barcode_hash{ ($max - 1) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; $barcode_hash{ ($max) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; our $barcode_pair_35 = 'TTTT_AAAA'; $| = 1; { no strict qw ( refs ); my @found = $func_name->(); # increment the bc_pair_num field $barcode_hash{$_}[_BC_PNUM]++ for @found; # display summary and found keys printf "Found %d keys\n", scalar @found; say join ', ', sort { $a <=> $b } @found; } exit(0); sub haukex { # Serial code. my @found; for my $key (1 .. $max) { if ($barcode_hash{$key}[_BC_PAIR] eq $barcode_pair_35) { say "Found at $key"; push @found, $key; } } return @found; } sub karlary { # Workers receive next array chunk. MCE::Loop::init { max_workers => 4 }; my @found = mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ( @$chunk_ref ) { if ($barcode_hash{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key, chunk_id: $chunk_id"); MCE->gather($key); } } } 1 .. $max; MCE::Loop::finish(); return @found; } sub karlhash { # Workers receive next array chunk. MCE::Loop::init { max_workers => 4 }; # Must have MCE 1.828 or later to run this demonstration. # See http://www.perlmonks.org/?node_id=1188593 for a faster # solution using hashref as input by spawning workers early. # That will run better on the Windows platform. my @found = mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ( keys %{ $chunk_ref } ) { if ($chunk_ref->{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key, chunk_id: $chunk_id"); MCE->gather($key); } } } \%barcode_hash; MCE::Loop::finish(); return @found; } sub karlseq { # Workers receive next sequence 'begin' and 'end' boundaries. MCE::Loop::init { max_workers => 4, bounds_only => 1 }; # The bounds_only option applies to sequence which means for # workers to compute the begin and end boundaries only and # not the items in between. The result is lesser overhead. my @found = mce_loop_s { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ($chunk_ref->[0] .. $chunk_ref->[1]) { if ($barcode_hash{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key, chunk_id: $chunk_id"); MCE->gather($key); } } } 1, $max; MCE::Loop::finish(); return @found; }

    Regards, Mario.

      The upcoming MCE update 1.828 allows a hash_ref as input_data. MCE workers may be spawned early to prevent Perl from making extra copies. I made this for a fellow Monk who pinged me to look at this thread. The OP's case does not require parallelization though. However, impoved MCE if one were to process a big hash later on involving complex operations. MCE 1.828 will be released soon. Among other things, signal handling is improved. Also 14% reduction in memory consumption made possible by loading Symbol, Fcntl, and File::Path on demand.

      #!/usr/bin/env perl # Re^2: search for particular elements of hash with multiple values # http://www.perlmonks.org/?node_id=1188593 use strict; use warnings; use feature qw ( say ); use MCE; use constant { _FOWPRIM => 0, _REVPRIM => 1, _BC_PAIR => 2, _BC_PNUM => 3 }; our $barcode_pair_35 = 'TTTT_AAAA'; # The core MCE API defaults to 1 for chunk_size. my $mce = MCE->new( max_workers => 4, chunk_size => 8000, user_func => sub { my ($mce, $chunk_ref, $chunk_id) = @_; for my $key ( keys %{ $chunk_ref } ) { if ($chunk_ref->{$key}[_BC_PAIR] eq $barcode_pair_35) { MCE->say("Found at $key"); MCE->gather($key); } } } )->spawn; my $max = 1000000; my $data = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_CCCC', 0 ]; our %barcode_hash = map { $_ => $data } 1 .. $max - 2; $barcode_hash{ ($max - 1) } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; $barcode_hash{ $max } = [ 'AGCTCGTTGTTCGATCCA', 'GAGAGATAGATGATAGTG', 'TTTT_AAAA', 0 ]; my @found; $mce->process( { gather => \@found }, \%barcode_hash ); $mce->shutdown; # increment bc_pair_num field $barcode_hash{$_}[_BC_PNUM]++ for @found; # display summary and found keys printf "Found %d keys\n", scalar @found; say join ', ', sort { $a <=> $b } @found;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1187870]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-03-29 08:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found