in reply to Re: how to isolate text in a text file.
in thread how to isolate text in a text file.

I'm happy to see this post and will follow this thread. I'm interested in genealogy, but have had no place to start with it with perl. A few questions:

Q1) If that text were the entire gene sequence for homo sapiens, how long would it be? How much variation is there in our populations of the cardinality of the gene sequence which represents a given person? Does it vary according to sex?

Q2) For analyzing one's 23andMe data, how is that represented and searched?

Q3) Is there a namespace in cpan developed for frequent tasks in this realm?

Thank you for your comments,

  • Comment on Re^2: how to isolate text in a text file.

Replies are listed 'Best First'.
Re^3: how to isolate text in a text file. (DNA related modules)
by hippo (Archbishop) on Dec 15, 2018 at 11:22 UTC
    Q3) Is there a namespace in cpan developed for frequent tasks in this realm?

    Well there is BioPerl, the source for which is on CPAN. There are plenty of other dists in the Bio tree as well.

Re^3: how to isolate text in a text file.
by bliako (Abbot) on Dec 15, 2018 at 11:36 UTC

    Hi Aldebaran,

    I am in no way expert in Biology as I only have touched some of its more interesting parts through bioinformatics. There is a huge collection of computational and data resources through the software package R, which is not Perl. But still free software. Through R one can download sequences, compare them etc. BUT it is a steep learning curve, almost impossible to climb. Then of course is the great effort of BioPerl at www.bioperl.org . Where I worked we all used R though, so not much exposure to it for me.

    I will start with the observation of Vavilov that the greatest genetic variation for a species is where the species evolved. Don't know how much has been verified by data but certainly people think this theory is valid. HomoSapien's (HS) genome's greatest variation is in Sub-Saharean region. Update: Vavilov was mainly talking about plants, transfering this to nomadic animals, humans may be a bit misleading because animals travel and so correlating location and DNA will not work in many cases - especially today.

    Then I will mention that genetic variation starts because of mutations from external factors, like radiation and chemical exposure - not necessarily dramatic exposure: note that it was happening thousands of years ago when pollution was not a problem. As for radiation we have it from the stars and from the Earth itself. It is believed (as in "who knows?" typical of the field of bios) that HS (mamals) females have a fixed stock of eggs and they release one each month. Whereas male HS replensish sperm at regular, short intervals. This means, I think, that female eggs accumulate mutations over their reproductive life whereas sperm gets a mutation but if not used, it is replaced soon and the mutation has no chance to be passed on to the offsprings. (BTW mutations can lead to good as well as to bad phenotype=traits, probability favours the bad - as usual ;( .) So, I say, the contribution of the female and the male to their offspring's DNA is qualitatively different as they reflect nature's past events at different time scales. Maybe related is that today we live twice as much as 5000 years ago (well only in some countries unfortunately) meaning more accumulations of mutations.

    In conjuction to Mutations there is also the Mating and Environmental Selection. Some of the species's member's gonads's cells get a mutation. The member finds a mate with a different mutation, they try to mate, if successful the offspring inhertis the mutation. The offspring grows (or dies if mutation kills it) and then the Environment takes over in test-driving it and sometimes crashing the individual to Kaeadas, the Chasm of Death before it had a chance to mate, whereas other times it crowns it as the king of the jungle fertilising hundreds of eggs and passing on its genetic footprint. So, lots of chance events leading to lots of genetic variation.

    Mendel's genetic theories are too simplistic for today's data - if ever were true outside his monastery's walls. But still taught in school spreading the belief that a single gene's ON/OFF is responsible for one physical characteristic (and if we find that gene we will cure blahblah - please donate, yeah right). That's so much rubbish installed in human brains that it will take a while, to remove it and look at genetics as a complex system where everything plays a role and things are less binary than thought mainly because people in the field are practical and if they can't handle the complexity they will create a less complex universe and happily live in that. Looking at genetics as a Whole remains a challenge both to persuade biologists to even consider it and also practically apply it and deal with its immense complexity.

    Wikipedia says that typical genetic variation is at 0.6% of the 3.2 billion bases comprising the HS DNA. Now a base is one of A, T, G, C. A gene is a small sub-sequence of bases (10^2-10^6) in the whole genome and is the blueprint for cells to make proteins. So a defective gene, means a defective recipe which may lead to defective proteins.

    Proteins serve a few roles. I think two important roles are acting as messengers for signals between and inside cells and causing or catalysing chemical reactions. A lot of their function and properties comes from their physical structure, electro-chemical properties. Therefore, a molecule placed wrong in the protein may cause it to like water rather than hate it (a major distinction between proteins) and function completely different. However, there are examples of extreme fault-tolerance and fault-intolerance. Again, even with lots of data scientific conclusions tend to be different depending on year, place and who made them. There is something for everyone when the machinery is oiled by Doleros Inc.

    I like to think of biological systems as an analogue computer. But I see nothing exciting in cracking it apart for the hack-challenge.

    Instead I get more excitement in working on the macro scales of society. There is a reason why in the old story, Prometheas gave the Fire to humans rather than genetically engineering them to have their middle finger act as a lighter.

    bw, bliako