in reply to Re^2: write to Disk instead of RAM without using modules
in thread write to Disk instead of RAM without using modules
Sometimes, you can load just one file into memory and then scan the other files one by one and, for each file, line by line, without ever loading the other entire files into memory. And, as a second step, compare the generated files containing the differences between the other files and file 1, which may (or may not) be much smaller than the original files, depending on your data shape.
Another approach (especially if the files are truly huge) is to sort the files according to the comparison key prior to the comparison and then read all of your files line by line in parallel. There is a penalty in sorting the files before the comparison, but it is often worth the cost, because the multifile comparison is then much faster. And, depending on where tour files are coming from, some of them may already be sorted.
Each case is different, so that there is no general strategy blindly applicable to your specific problem, and this is why I can't suggest a solution without knowing in details what you're really comparing and what kind of differences (or common records) you're looking for.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: write to Disk instead of RAM without using modules
by Anonymous Monk on Oct 24, 2016 at 07:17 UTC | |
by hippo (Archbishop) on Oct 24, 2016 at 08:32 UTC | |
by Laurent_R (Canon) on Oct 24, 2016 at 17:10 UTC | |
by Anonymous Monk on Oct 25, 2016 at 07:54 UTC | |
by Corion (Patriarch) on Oct 25, 2016 at 07:58 UTC | |
by Anonymous Monk on Oct 25, 2016 at 08:36 UTC | |
| |
by BrowserUk (Patriarch) on Oct 26, 2016 at 13:08 UTC |