yuvraj_ghaly has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Counting amino acids
by davido (Cardinal) on Jul 19, 2013 at 04:48 UTC

    That's a product spec. What have you written so far, that we might help you with?


    Dave

      #! /usr/bin/perl -w use strict; open (S, "$ARGV[0]") || die "cannot open FASTA file to read: $!"; my %s;# a hash of arrays, to hold each line of sequence my %seq; #a hash to hold the AA sequences. my $key; while (<S>){ #Read the FASTA file. chomp; if (/>/){ s/>//; $key= $_; }else{ push (@{$s{$key}}, $_); } } foreach my $a (keys %s){ my $s= join("", @{$s{$a}}); $seq{$a}=$s; #print("$a\t$s\n"); } my @aa= qw(A R N D C Q E G H I L K M F P S T W Y V); my $aa= join("\t", @aa); print ("Sequence\t$aa\n"); foreach my $k (keys %seq){ my %count; # a hash to hold the count for each amino acid in the p +rotein. my @seq= split(//, $seq{$k}); foreach my $r(@seq){ $count{$r}++; } my @row; push(@row, $k); foreach my $a (@aa){ $count{$a}||=0; $count{$a}= sprintf("%0.1f",($count{$a}/length($seq{$k}))*100) +; push(@row,$count{$a}); } my $row= join("\t",@row); print("\n$row\n"); }

      this is the code but it is giving percentage output but i need in decimal

        yuvraj_ghaly:

        It looks like your problem is on line 42. Just stop converting the values to percentages, convert them instead to the format you want.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Counting amino acids
by space_monk (Chaplain) on Jul 19, 2013 at 05:53 UTC

    As the previous poster has said, you have not given us much to go on, but if you are just starting out and have nothing so far, here's a plan of attack for the problem

    One way to do it is to write your own routine which would read FASTA format and then simply greps or counts the number of amino acids. This is not the recommended way however!

    A better way is to look on CPAN and see if someone has done some or all of the work for you.Bio::DB:Fasta and Bio::SeqIO, FastaParse and Bio::Phylo::IO are all starting points for reading FASTA files and parsing FASTA data.

    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
Re: Counting amino acids
by mtmcc (Hermit) on Jul 19, 2013 at 07:03 UTC
    Ah, you want to modify this script?

    If you don't want percentages, you can just modify the last $count line:

    from

    $count{$a}= sprintf("%0.1f",($count{$a}/length($seq{$k}))*100);

    to this:

    $count{$a}= sprintf("%0.6f",($count{$a}/length($seq{$k})));

      see the problem is I don't want any decimal. This code do division calculation between amino acid counted to overall length of sequence. I just require the amino acid counted.

        Well then replace the line I mentioned above with:

         count{$a};