julio_514 has asked for the wisdom of the Perl Monks concerning the following question:
Dear monks, After a long time, I seek your wisdom once again!
I have two big input files of about 1.5 million rows each (each rows actually consists of a unique ID number). Lets call them file#1 and file#2. The vast majority of these ID numbers occurs one time in BOTH these files and some ID number are exclusive to file#1 OR file#2 only.
What I want to do is to produce an output of three files: 1) one file with IDs occurring in both input files, 2) one file containing IDs unique to file#1 and 3) one file containing IDs unique to file#2.
I mean... I know how to write the code to do this kind of stuff, but its taking forever the way I wrote it. Its the first time I have to deal with huge files like this... Anyone have a suggestion for an efficient implementation for this kind of task?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Help for finding duplicates in huge files
by roboticus (Chancellor) on Jul 16, 2011 at 02:45 UTC | |
|
Re: Help for finding duplicates in huge files
by Marshall (Canon) on Jul 16, 2011 at 04:07 UTC | |
|
Re: Help for finding duplicates in huge files
by CountZero (Bishop) on Jul 16, 2011 at 10:24 UTC | |
|
Re: Help for finding duplicates in huge files
by locked_user sundialsvc4 (Abbot) on Jul 16, 2011 at 11:46 UTC | |
|
Re: Help for finding duplicates in huge files
by Anonymous Monk on Jul 16, 2011 at 02:12 UTC | |
|
Re: Help for finding duplicates in huge files
by julio_514 (Acolyte) on Jul 16, 2011 at 20:30 UTC |