Re: Re: using hash uniqueness to find the odd man

Your code runs in O(@a * @b) time because of the grep nested inside the foreach. The hash approach will run in O(A + @b), where A is the time to build the hash %seen out of the values of @a (probably about @a*log(@a)). Hashes should be faster if the data sets are somewhat large.

If the two lists of DNs are sorted initially, you can use a merge algorithm (as in the unix comm program). That is an optimal solution, because it takes O(@a + @b) time, and there's no way to solve the problem without examining everything in both lists. If you have to sort the lists first, that's more expensive than the hash method.

Update: Now that I'm getting analysis of algorithms paged back in, I'm thinking A should have been @a in the first paragraph. The hashtable has to expand about log(@a) times, but each time it only has to reconsider the elements that were previously inserted, so it's @a + @a/2 + @a/4 + ... = @a*2, which is proportional to @a.

Comment on Re: Re: using hash uniqueness to find the odd man

Replies are listed 'Best First'.
Re: Re: Re: using hash uniqueness to find the odd man by tachyon (Chancellor) on Jul 11, 2001 at 05:26 UTC
Yeah I agree it is not the best solution, I kinda mentioned it might well be slow. But today is grep week so what can you do? As you point out it will slow in a geometric fashion. I have added an update that uses a hash solution - with grep of course :-) Update Just benchmarked the code on a 10,000 element data set, results not as bad as I feared - the iterative method is only 3 orders of magnitude slower. I think three orders of magnitude sounds better than 1000 times! cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]

Replies are listed 'Best First'.

Re: Re: Re: using hash uniqueness to find the odd man
by tachyon (Chancellor) on Jul 11, 2001 at 05:26 UTC

Yeah I agree it is not the best solution, I kinda mentioned it might well be slow. But today is grep week so what can you do? As you point out it will slow in a geometric fashion. I have added an update that uses a hash solution - with grep of course :-)

Update

Just benchmarked the code on a 10,000 element data set, results not as bad as I feared - the iterative method is only 3 orders of magnitude slower. I think three orders of magnitude sounds better than 1000 times!

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

[reply]