in reply to How to process two files of over a million lines for changes
How about an array of MD5 signatures for each line? If you've got two files with 1.2 million rows each and an MD5 signature is 16 bytes long then you should be able to index them both in around 400MB of memory. If Perl adds too much overhead to the arrays (and it might) then use Tie::IntegerArray or just program it in C (heresy!).
If the rows have a primary key you might be able to use Bit::Vector to setup bitmaps to test for insertions and deletions. That would likely use less memory than an array of 16bit MD5s, depending on how sparse your key-space is.
-sam
|
|---|