in reply to Re: Matching data between huge files
in thread Matching data between huge files

Indeed. Though this also assumes that the "record-ids" are unique in both files -- which they may well be, of course.

For completeness one would recommend checking the validity of the "record-ids" as the two files are read, to ensure that they are (a) numeric, (b) unique and (c) in ascending order -- so that one can have confidence in the result.

As usual it's worth examining the problem before writing the code. For example:

Ah well. Coding first -- analysis afterwards [1]. How often do we do that ?

[1] As the Queen of Hearts might have it.

Replies are listed 'Best First'.
Re^3: Matching data between huge files
by est (Acolyte) on Aug 28, 2008 at 00:54 UTC
    Update: both of the files do *not* sorted, record_id is *not* unique in file-1, and file-2 is equally big (or bigger) and most likely going to change weekly.

    Having said that, I really like the solution given by BrowserUk in the sense that my Benchmark gives a much faster result compares to my linear solution and I don't need to build any DB.

    I haven't checked the memory usage with "vec()" though but I don't think I need to do that as BrowserUk has given an estimated comparison with a hash slurping :-)

    Thanks.