I'm going to write a little script that will generate random sequences of letters and then work with an outside program to analyze these sequences. The entire sequence space is greater than 4x10^15 so I'll want to test more than 100,000 sequences.
But I only want to test each sequence once. My first thought was to put each analyzed sequence into a hash and then check the hash for a match before starting analysis of a new sequence. Is there a point, though, where a hash becomes problematic in terms of size? My other thought was to just check a file (all sequences and the analysis results will have to be written to a file) for the new sequence. This seems like it would take quite a bit longer, however.