in reply to using hash uniqueness to find the odd man

Here are two tests.

If both files (two 12000 lines of rand 10e6's) are presorted with unix sort (70ms each on a PIII 500MHz), you can narrow in with an approximation method (seek, read a character from each file, compare and cut the range in half again.. not a great algorithm maybe) to see where the byte streams diverge. My Perl program took 10ms to get there, and then I stopped working on it.. Diff is faster I think (under 1ms). Perl overhead for just making those calls, see below, is under 1ms, so two sorts and a diff are about 140ms. This resembles the hash time.

`sort oddbig > oddbigsort`; # these are all backticks `sort oddbig2 > oddbigsort2`; @ans = `diff oddbigsort oddbigsort2`; print "ANS: $ans[1]\n";

But those hash slices are positively gorgeous. They get my vote!

Incidentally japhy's code on the same system took 170 ms, see my implementation below. I wonder if the hash code involves something like a sort.

#!/usr/bin/perl -w use strict; use Benchmark; my $t = &Benchmark::timeit(1,' my %seen; my $f1="oddbigsort"; my $f2="oddbigsort2"; open (A,$f1); open (B,$f2); my @smaller_list = <A>; my @larger_list = <B>; if ($#larger_list < $#smaller_list) { @seen{@smaller_list} = (); delete @seen{@larger_list}; } else { @seen{@larger_list} = (); delete @seen{@smaller_list}; } my @culprits = keys %seen; foreach (@culprits) {print "$_\n";} close (A); close (B); '); print timestr($t),"\n";