in reply to MCE: How to access variables globally

G'day biohisham,

I believe this code is the guts of what you want:

#!/usr/bin/env perl use strict; use warnings; use autodie; use MCE::Loop; use Data::Dumper; my $data_file = 'DATA_F.dat'; my (%hash, %hash2); { open (my $fh, '<', $data_file); while (<$fh>) { my ($k, $v) = split; $hash{$k} = $v; } } MCE::Loop::init { use_slurpio => 1, max_workers => 16, init_relay => 0, }; %hash2 = mce_loop_f { MCE->gather(split ' ', $$_); } $data_file; print Dumper \%hash; print Dumper \%hash2;

See MCE and MCE::Loop for an explanation of what I've done there. The rest of the Perl code is very straightforward but, of course, do ask if there's anything you don't understand.

With this input (which I think should be the same as your original "DATA_F" input):

$ cat DATA_F.dat 1 one 2 two 3 three

I get this output:

$VAR1 = { '2' => 'two', '3' => 'three', '1' => 'one' }; $VAR1 = { '3' => 'three', '1' => 'one', '2' => 'two' };

I ran another test with much larger input. Due the amount of data, it's in the spoiler.

— Ken

Replies are listed 'Best First'.
Re^2: MCE: How to access variables globally
by biohisham (Priest) on Dec 19, 2021 at 23:20 UTC

    Kia ora!

    Yes, this is the crux of it but I avoided gathering from mce_loop_f in this way

    %hash2 = mce_loop_f { MCE->gather(split ' ', $$_); } $data_file;
    Because (and I can be totally wrong in assuming so) gather will only return %hash2 in this instance while I am also interested in returning $counter2. The docs show gather can be called multiple times, doing so will complicate teasing apart the returned output from mce_loop_f{}. It would be great if gather can behave a bit like a sub so WYSIWYG
    #hypothetical code: #alas if gather can gather two or more data types (%hash2, $counter2) = mce_loop_f { my $internal_counter2++; MCE->gather(split '\s', $$_, $internal_counter2); #return data typ +es } $DATA_F;


    Something or the other, a monk since 2009

      If all you need is a count of the number of iterations, and the keys are unique, you can use something as simple as:

      $ perl -E 'my %x = (a=>1, b=>2, c=>3); say "Count: ", 0+keys(%x)' Count: 3

      If the situation is more complex than that — non-unique keys; lines skipped for some reason; and so on — you'll need to provide more information or I'm only guessing (and I don't really want to waste time doing that). You should show some sample input: keep it short but still realistic with example exception cases. Then show the expected output from that input.

      If duplicate keys are encountered, should they be skipped or should their value overwrite the previous value. Other reasons that lines might be skipped are: they're blank, are comments, don't match /^\S+\s+\S+$/, or something else. What else is special that I should know about?

      When I saw your OP code, I thought the first (non-MCE) loop, and the two counters, were just for testing. Clearly, that was a poor guess; please help me out here.

      — Ken

        Thanks a lot Ken for your help; your guess is on-spot. The keys reflect States that do repeat in different lines/cols and associated with them prob and errorProb values against a time stamp. So the idea is to turn the states into columns of state name\tProb[state]\t Error[state]and align them against the time. The original file is generated by a mathematical modelling (Boolean modelling of networks) software and requires changing its format to be able to do further analysis. Its a large file hence I split it and I am reading it in chunks to avoid the OOM killer.

        The header looks like this

        Time TH ErrorTH H HD=0 State Proba ErrorProba +State Proba ErrorProba State State Proba ErrorProba

        Here is how my minimum working example looks like


        Something or the other, a monk since 2009