in reply to Memory issue with large array comparison

The "hash" solution is probably the fastest, but less flexible as you can only check on the exact match between IDs and the filename part of the filepath.

It will not work if you have to check if the ID is *somewhere* mentioned in the filepath.

Still you do not have to despair! Nor do you have to check each filepath against each ID with a separate regex (which would be very slow).

Through the magic of Regexp::Assemble it only takes a program of a few lines:

use Modern::Perl; use Regexp::Assemble; use autodie; open my $PATTERNS, '<', './patterns.txt'; my $re_pattern = Regexp::Assemble->new->add(<$PATTERNS>)->re; close $PATTERNS; open my $PATHS, '<', './paths.txt'; do {print unless /$re_pattern/} while <$PATHS>; close $PATHS;

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Replies are listed 'Best First'.
Re^2: Memory issue with large array comparison
by aaron_baugher (Curate) on May 25, 2012 at 13:54 UTC

    Wow, that's an impressive module. Thanks for the tip on that. Out of curiosity, I benchmarked it against the hash solution, and the hash is faster of course, but only 13 times faster. As you say, there will be times when you can't use a standard hash lookup, so something like this is a good alternative. I had it print out the regex it built for 10,000 different keys (all 7-char random lowercase letters) and it was huge, yet it only took a little over a second to compare it to 50,000 different strings on my system. The Perl regex machine is amazing.

    Aaron B.
    Available for small or large Perl jobs; see my home node.