in reply to Large Set Efficiency Question

Assuming you have enough memory to store the smaller dataset in a hash (after all, your code uses arrays to hold both datasets).

Instead of storing @data in an array, store it in a hash (%data) with the key being $id . $datestamp.

Read in each $ref and test if the key built from it exists in %data. If not print the $ref or store it in an array for processing later.

--Jim

Update: Oh well, conversation with PHB slowed my reply but looks like you've already got what you need. :)