in reply to Re^2: System call doesn't work when there is a large amount of data in a hash
in thread System call doesn't work when there is a large amount of data in a hash

Hi again,

I'll just suggest once more that you let go of the idea that you must load all your data into an in-memory hash in order for your program to be fast. For one very fast approach please look at mce_map_f in MCE::Map (also by the learned marioroy) which is written especially for optimized parallel processing of huge files.

(As an aside, have you profiled your code? I would think that Perl could load data from anywhere (file, database, whatever) faster than a shell call to an external analytical program would return ... or does your program not expect a response?)

As far as your finding that

"parallelisation of the code after loading the hashes ... turned out slowing down the process or impossible because it would duplicate the hash"
... please see MCE::Shared::Hash.

Hope this helps!


The way forward always starts with a minimal test.
  • Comment on Re^3: System call doesn't work when there is a large amount of data in a hash
  • Download Code

Replies are listed 'Best First'.
Re^4: System call doesn't work when there is a large amount of data in a hash
by Nicolasd (Acolyte) on May 01, 2020 at 11:37 UTC
    Hi,

    I think I tried MCE::Map a few years ago, but will check it to be sure. I tried many methods so that is why I am convinced about the big hash, but I could be wrong of course, as there is much of Perl I don't know.
    But small differences in speed will make a big difference because the script has to access the hash millions of time (I actually build 3 hashes), so some alternatives work fine at first sight, but on large datasets it slows down a lot.
    Similar software (in C++ or python) usually need even more memory than mine (although they use a different graph based method so hard to compare)

    (As an aside, have you profiled your code? I would think that Perl could load data from anywhere (file, database, whatever) faster than a shell call to an external analytical program would return ... or does your program not expect a response?)
    Sorry I don't understand the question, is this about the system call? And I guess I didn't profile the code, as I don't know what that means :)

    I think I tried this one (MCE::Shared::Hash) and it turned out too slow, but again I need to verify this, I will check If find the code, else I will try it.
    Thanks
        I used that before, but I am not a big fan because it doesn't really show which parts consume the most time.
        Often parts that take the most time were not shown in the analysis.