in reply to Compare large files
But based on the data sample you showed in one of your replies here, it looks like the files are not sorted. So the problem you need to fix is in the program that produces these files -- they should be written in sorted order.
Then you can use the standard "diff" utility, which will correctly show:
And "diff" already knows how to manage big files -- it might take a while, but I'm pretty sure it will finish.
Also, it might help if you consider breaking your outputs into smaller pieces. How hard/bad would it be to have your directory scan process create 10 files of 100 MB each on average (or 100 files of 10 MB each on average)? I think the directory structure should provide a sensible way to do that...
(update/ In fact, it might be worthwhile to simply create one tabulation file per directory -- I believe you start with a list of the directories being scanned, so the task becomes: create and compare table files for each directory in the list; that should be pretty simple to maintain, and will run as quick as any other approach. /update)
One last point, again based on the data sample you posted above. Are you sure that all differences are equally important and relevant? If yes, then using diff is fine. If not, either adjust the script that creates these files, to avoid cases where unimportant differences are present in the data, or else you'll have to write your own customized perl variant of diff (or better yet, a filter on the output from diff) to exclude unimportant differences.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Compare large files
by boardryder (Novice) on Jul 10, 2009 at 00:48 UTC |