in reply to Window size for shuffling DNA?
Let's see if I understand your question, by summarising the descriptions you've given.
You start with a real DNA sequence of ~100MB length; which when you pass it to a 3rd party program, is searched for a particular sequence (or sequences?) that are between 20bytes and 20kbytes in length, and are identified by the presence of two sequences (of ~50bytes) at either end of the wanted sequence.
Eg.
...xxxxxxxxHEADERxxxxx 20-20k bytes xxxxxTRAILERxxxxxxxxxx....
From your graph, I suspect that you run this process on several (3 shown) real DNA sequences?
Further, I suspect that the (unstated) aim of this process is to identify the ~50 byte header and trailer sequences that are common to all the different DNA sequences, that delimit a 'common subsequence' of dna across species?
(Do you supply the header & trailer sequences to the 3rd party program?)
The purpose of your windowed shuffling process is to mix-up the real DNA -- in a locally, statistically similar, but randomised -- way, in order to eliminate false positives, such that if the header and trailer sequences you've previously identified are still found in the randomised sequences, then they are probably not good candidates for identifying common sequences;
And your question is asking whether the way you are randomising the sequences, via this windowing mechanism, is statistically valid.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Window size for shuffling DNA?
by onlyIDleft (Scribe) on May 18, 2015 at 17:29 UTC | |
by BrowserUk (Patriarch) on May 18, 2015 at 19:32 UTC | |
by onlyIDleft (Scribe) on May 18, 2015 at 22:01 UTC | |
by BrowserUk (Patriarch) on May 19, 2015 at 17:22 UTC | |
by onlyIDleft (Scribe) on May 20, 2015 at 00:01 UTC | |
|