in reply to Efficient search through a huge dataset
(If the files are already sorted, you can just pass them directly to comm, without first processing with sort. Here, I'm using the bash shell's <(command) syntax to avoid using having to deal with temporary files for holding the sorted records.)$ comm -12 <(sort -u file1) <(sort -u file2)
Here's how to find the records that are unique to the first file:
Most sort implementations are fast and will use external (file-based) sorting algorithms when the input is large, so you don't need to worry about input size.$ comm -23 <(sort -u file1) <(sort -u file2)
Cheers,
Tom
Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder
|
|---|