in reply to Re: Bioinformatic task
in thread Bioinformatic task

typically how many sequences in the input file, and how long each DNA chain?
what does your "compare" operation do? if it's testing for exact match then you don't need to..as you can test a sequence key exists before assigning. so if you put in this test for prior existence before assignment, you're essentially checking the incoming sequence against every sequence already loaded in the hash.
and if the comparison of the DNA chains is more involved, then another structure more suited to the comparison operation could help make the task more managable/efficient.
the hardest line to type correctly is: stty erase ^H

Replies are listed 'Best First'.
Re^3: Bioinformatic task
by aquarium (Curate) on Nov 08, 2010 at 21:57 UTC
    a straight equality comparison can also be done by sort | uniq -c unix/linux utilities, which will give you the count of the common sequences. but you could also just work with the sort output with a script that notices when the sequence changes from the last read one. store the sequence identifier in second column and only use the first column (sequence) for the sorting.
    the hardest line to type correctly is: stty erase ^H