in reply to Re^4: Memory issue with large array comparison
in thread Memory issue with large array comparison
I was, I realize now (thanks to your watchfulness), obsessing on the multiple responses offering use of a hash as a solution. I still think those represent something close to cargo-culting a meme, (rather than actual code) -- but not an optimal solution, since, if I read the wisdom of the sages correctly (and if they're right, of course), using a hash would be at least as memory intensive and probably more so.
That's also an issue with map and grep (cf Eliya's observations, above), but perhaps less so than using a hash (that's another test that I haven't undertaken, but which might lead to a publishable finding). And in the same node, Eliya makes a cogent point (echoed in slightly different context by dave_the_m's code: there are a variety of ways to attack OP's problem with reduced memory demand. Yet another might be a step-wise solution: first, separate the id portion of the first dataset to a file of it's own; then identify the ids in the second file that don't have identical (or identically normalized, if that's involved, too) values.
But, again, ++ for casting a sharp eye on the prior responses.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Memory issue with large array comparison
by aaron_baugher (Curate) on May 26, 2012 at 03:27 UTC |