in reply to Re: File comparison: not diff
in thread File comparison: not diff
But I see a problem, the files are large, about 500000 lines, each line is up to 1275 characters (avg about 250). That's about 500000 x 250 x 2 = 238 Mbytes of data in the hashes. The data will grow in the future. I think that will be rather stressful for the machine it's running on:) I think I'll try sorting the files beforehand, and keep only two lines in memory. Something like:
#code fragment leaving out parseing, reporting, use strict etc. my ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); my ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); # work through both files sequentially matching advances by key while(defined($leftkey) and defined($rightkey)) { my $compare = $leftkey cmp $rightkey; if($compare == 0) { if ($leftvalue ne $rightvalue) { value_diff($leftkey, $leftvalue, $rightkey, $rightvalue); } ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); next; } elsif($compare > 0) { missing_left($rightkey,$rightvalue); ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); next; } else { missing_right($leftkey,$leftvalue); ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); next; } } # are there missing items at end of the files? while(defined($leftkey)) { missing_right($leftkey,$leftvalue); ($leftkey,$leftvalue) = split_line(scalar <LEFTFILE>); } while(defined($rightkey)) { missing_left($rightkey,$rightvalue); ($rightkey,$rightvalue) = split_line(scalar <RIGHTFILE>); }
|
|---|