in reply to Re^7: Window size for shuffling DNA?
in thread Window size for shuffling DNA?

Thank you for your detailed and patient explanation

Several of my suspicions are confirmed

I must point out that at the outset I already knew that comparing original and shuffled DNA would NOT allow identification and/or elimination of the false positives. But only a calculation of the percentage of elements that are likely to be false positives. So your final reply confirms that unequivocally

You assert that because there is no predictive power due to the shuffling, this is useless. As a biologist, I would argue against that claim. When you need to experimentally verify a set of predictions, I would take a method that yields ~ 10% FDR over another method that suffers from ~ 40%. That way time, energy and resources are better utilized. So it does not matter so much which ones are real are not, if a large enough sample size is cross-verified experimentally, it should check out as per theoretical FDR predictions. IF it does not, something is wrong with the computation pipeline or the experimental verification protocol or both.

What is interesting however is your clear statement that since one sample is original and another is shuffled, the FDR calculation as # elements found in shuffled DNA vs. the original DNA is completely bogus! :) In my experience, this is how FDRs for sequence based analyses have been reported in published literature. I do not know if there are other viable methods to assess FDR. But your statement is of concern, in terms of any disconnect existing between the theory of FDR and how it might be applied by biologists

In any case, your replies were all enlightening. Thank you for taking the time to reply with patient explanations. I have enough fodder to go beyond my confusion and proceed with my analyses. Cheers!

Replies are listed 'Best First'.
Re^9: Window size for shuffling DNA?
by BrowserUk (Patriarch) on May 21, 2015 at 16:59 UTC
    As a biologist, I would argue against that claim. When you need to experimentally verify a set of predictions, I would take a method that yields ~ 10% FDR over another method that suffers from ~ 40%. That way time, energy and resources are better utilized.

    You're right. I'm not a biologist, but, please think again.

    For each of your 3 species, you have a single "actual discoveries" figure; but 7 different %FDRs.

    The original data, and discoveries don't change, so at best, only one of those 7 numbers could possibly be right; and which one could be different for each of the three species. Or they could all be wrong.

    Picking any of them because it is convenient is just wishful thinking.

    And basing your experimental strategy upon a guess -- it is nothing more -- because it will involve less work; completely subverts the scientific method.

    I'll shut up now.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked