in reply to Iterating through Two Arrays. Is there a better use of memory?

I'm accepting your analysis that memory use may be an issue, but CPU use probably isn't (though that's a rare enough state that I'm a bit skeptical internally).

You can store only the smaller file in memory; then read through the larger file, checking each entry against the smaller one in memory just after reading it. That should cut your memory use by half or better (depending how asymmetrical the real pairs of files are), and doesn't cost any performance. (But doesn't leave you with the data in memory, which may be valuable to a later stage of processing.) Write your results as you go to another file. (But this may not fit the overall workflow of your problem. However, in general, if you're approaching memory size limits, adopting designs that stream data through rather than holding it all in memory at once is your winning strategy.)

  • Comment on Re: Iterating through Two Arrays. Is there a better use of memory?