You say you can't load both files into memory at the same time, but I wonder if you can load something important about every record from both files into memory at the same time.
How about an array of MD5 signatures for each line? If you've got two files with 1.2 million rows each and an MD5 signature is 16 bytes long then you should be able to index them both in around 400MB of memory. If Perl adds too much overhead to the arrays (and it might) then use Tie::IntegerArray or just program it in C (heresy!).
If the rows have a primary key you might be able to use Bit::Vector to setup bitmaps to test for insertions and deletions. That would likely use less memory than an array of 16bit MD5s, depending on how sparse your key-space is.
-sam
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.