FDR = False Discovery Rate = # false positive / total # * 100 %
LCV = Local Combinational Variables, never heard of it before. The author of the software has an earlier paper using it for some other bioinformatic purpose. This paper is at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2761287/ Not sure if it is even directly relevant, but throwing it out there
In DNA sequence parlance, 1 letter = 1 base pair = 1bp (abbreviation)
Therefore 1000bp = 1KiloBasePair = 1KB (abbreviation)
Likewise 10^6bp = 1MegaBasePair = 1MB, and so on...
pHMM = profile Hidden Markov Model used to create probabilistic models of insertions, deletions and substitutions for protein or DNA sequence multiple alignments, more info about this may be gleaned from http://en.wikipedia.org/wiki/Hidden_Markov_model However, this may be a distraction since the software does NOT use pHMMs, but LCVs - on which I cannot find any theory, from just a Google search
Number of different header sequences is 304 in the head LCVs library
Number of different trailer sequences is 576 in the tail LCVs library
Length variation of header sequence sin the head LCVs library: ~10 - 50
Length variation of trailer sequence sin the tail LCVs library: ~10 - 50
The 3rd party software in step 1, detects matches to head LCVs separately, then matches to tail LCVs separately again
After this step 1, the software, in step 2, joins these heads and tails into pairs. Default parameters limit the intervening length between any given head and tail between 20 and 20,000 letters. In other words, if head and tail combinations are shorter than 20bp or longer than 20KB, they will be ignored. Please note that software is NOT looking in any form or manner for matches to ANY intervening sequences between the head and the tail matches. It is ONLY looking for the matches to the head and tail LCVs per se, and then pairing them, and then imposing the size range (20bp-20KB) filter to report the elements
IMO I do not think graph tailing off is meaningless. For the following reason : I ran these tests on randomized DNA sequences of completely different species (3 shown in the figure, 2 other species not in the figure) all 5 of which show the exact same trend. It would be unlikely to see the exact trend for all 5 species, unless this itself is a random event... So there is something going on, that I don't understand...and may be it is not easy to explain...
In reply to Re^4: Window size for shuffling DNA?
by onlyIDleft
in thread Window size for shuffling DNA?
by onlyIDleft
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |