in reply to unique sequences

It's hard to tell from the source what the input looks like, and what should be done...

Do I understand it right that
1) Input has [ACGT] sequences + some comment lines starting with >?
2) You need to find strings of 12 nucleotides ending in a GG that occur in the input only once?

It would be nice if you posted a short input sample (although I guess it's may not generate the needed discrepancy).

Replies are listed 'Best First'.
Re^2: unique sequences
by erix (Prior) on Dec 11, 2017 at 08:58 UTC

    These kind of dmel-all-chromosome files can be found here:

    ftp://ftp.flybase.net/releases/current/dmel_r6.18/fasta/

    dmel-all-chromosome-r6.18.fasta.gz

    That's a few releases later but I suppose that doesn't matter for the processing solution. (The older releases will be somewhere there as well.)

      That's a few releases later...

      So basically species have releases, and are versioned these days. I for one would love to git revert a mammoth!