I was, I realize now (thanks to your watchfulness), obsessing on the multiple responses offering use of a hash as a solution. I still think those represent something close to cargo-culting a meme, (rather than actual code) -- but not an optimal solution, since, if I read the wisdom of the sages correctly (and if they're right, of course), using a hash would be at least as memory intensive and probably more so.
That's also an issue with map and grep (cf Eliya's observations, above), but perhaps less so than using a hash (that's another test that I haven't undertaken, but which might lead to a publishable finding). And in the same node, Eliya makes a cogent point (echoed in slightly different context by dave_the_m's code: there are a variety of ways to attack OP's problem with reduced memory demand. Yet another might be a step-wise solution: first, separate the id portion of the first dataset to a file of it's own; then identify the ids in the second file that don't have identical (or identically normalized, if that's involved, too) values.
But, again, ++ for casting a sharp eye on the prior responses.
In reply to Re^5: Memory issue with large array comparison
by ww
in thread Memory issue with large array comparison
by bholcomb86
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |