You have 2 files. If the first is large (a few hundred MB) then I'd expect perl to run out of memory before you've loaded it. If you're using virtual memory this might take a loong time. if it succeeded in loading that but is swapping, then the second will take forever to process. Either way you aren't handling 8 GB of data in RAM at once.
I would use a sort utility on both files. That is available as a Unix utility. I have not used it, but Sort::External is a pure Perl version if you don't have that utility. Then process both files in parallel. With the idea being that sequences come up in the same order in both files. So you have 2 filehandles (one for each file) and 2 last lines (one for each file), and you always read from whichever one is smaller, processing a match when you find one. That way you do not keep any data in RAM.
Be warned that sorting 8 GB is liable to take 20 minutes or so on your machine.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.