vouchsafing has asked for the wisdom of the Perl Monks concerning the following question:

Hey. I'm totally new when it comes to programming in Perl. But I have to make a project and I have no idea how to do it. Maybe someone can help me. The script have to load several sequences in the format FASTA placed in one file. Example input file attached (for example 'gens.txt'):
>1 AGTATCGGACCCGAAGACATTACGCTTAGAGACTTGAAAA CCTACAGTAAAGAAGCAGCGTCTGGATAT +CTGGAAGACAA CGGATTGAAGCTTGTAGAAAAAGAAGCATACTCAGATGAT GTTCCAGAAGGACAGG +TTGTCAAACAAAAACCAGCAGCAG GTACGGCAGTAAAGCCGGGAAACGAAGTTGAAGTGACATT CTC +TCTCGGACCAGAGAAAAAACCTGCGAAAACAGTGAAA GAAAAGGTCAAGATCCCCTACGAACCAGAAA +ATGAAGGGG ACGAGCTTCAAGTGCAAATCGCGGTTGACGATGCGGATCA >2 CCATATCGGAGACAGCAGATGCTATTTGCTTCAGGACGAT GATTTCGTTCAAGTGACAGAAGACCATTC +GCTTGTAAATG AACTGGTTCGCACTGGAGAGATTTCCAGAGAAGACGCTGA ACATCATCCGCGAAAA +AATGTGTTGACGAAGGCGCTTGGA ACAGACCAGTTAGTCAGTATTGACACCCGTTCCTTTGATA TAG +AACCCGGAGACAAACTGCTTCTATGTTCTGACGGACT GACAAATAAAGTGGAAGGCACTGAGTTAAAA +GACATCCTG TGGACAAAGCCAATCAGAATGGCGGAGAAGGCGGAGAAGC >3 ATAAAACAACGGTATTTGCCGGTCAGTCCGGTGTTGGGAA ATCCTCGCTTCTCAACGCGATCAGTCCGG +AGCTCGGATTA AGAACAAACGAGATTTCCGAGCATTTGGGCCGCGGGAAAC ACACAACCCGCCACGT +GGAGCTGATTCACACGTCCGGAGG TTTGGTTGCAGATACACCGGGATTCAGCTCGCTTGAATTT ACA +GACATTGAGGAAGAAGAGCTGGGCTATACCTTCCCTG ATATCAGAGAAAAAAGCTCTTCATGCAAATT +TAGAGGCTG TTTACATCTGAAAGAGCCGAAATGTGCGGTGAAACAAGCT
Then the script should check how similar are the sequences and print percent identity, and then it should also generate a consensus sequence. Any help would be very precious for me!
  • Comment on How to read and write several DNA sequences in the format FASTA placed in one file?
  • Download Code

Replies are listed 'Best First'.
Re: How to read and write several DNA sequences in the format FASTA placed in one file?
by toolic (Bishop) on Apr 25, 2011 at 13:20 UTC
Re: How to read and write several DNA sequences in the format FASTA placed in one file?
by Corion (Patriarch) on Apr 25, 2011 at 13:19 UTC

    Have a look at http://bioperl.org. Also, we are not biologists and thus "sequence similarity" and "consensus sequence" tell us very little. You will have to provide more technical explanations how you would measure similarity and what "consensus" would mean in your case. Also see what has to offer for FASTA.

    Please note that this site is not a script writing service. While we will assist you with writing your code, we expect you to do the majority of the work and show some effort. This mostly involves you showing the code you have already written and explaining where you encounter the problem.