note
ariels
<p>
Doing it in C and with moderate performance requirements (e.g. ~100_000 elements in each list), I'd go [samtregar]'s way and compare both lists sorted (this is probably faster than sorting the one and binary-searching the other through it when both lists have lengths of the same order).
</p>
<p>
Doing it in C, but in a high performance situation, I'd try to <em>change the data representation</em>. If we can recast the problem such that the lists can be stored as bit strings, <strong>nothing</strong> can beat an AND. On a modern processor I can compare 64 elements at a time; unroll the loop 4--8 times and you'll be running about as fast as is possible.
</p>
<p>
If elements of the lists are integers in a known range, this method is obviously suitable. In other cases, it might be possible to use a hash table (probably called a "symbol table") to translate list elements into indices, and then represent each list as a bitmap.
</p>
<p>
Note that this approach might or might not be suitable, depending on the precise character of the code. Unfortunately, high-performance solutions often suffer from this problem.
</p>
<p>
[samtregar]'s solution is also useful in another advanced programming environment. In a shell script, to get common elements of files <samp>AAA</samp> and <samp>BBB</samp> efficiently and in sorted order, try
<blockquote><code>
sort -u AAA > AAA.sorted
sort -u BBB > BBB.sorted
comm -12 AAA.sorted BBB.sorted > common
</code></blockquote>
This technique works amazingly well, in fact.
</p>
162711
162711