in reply to Re^11: calculation of charged amino acids
in thread calculation of charged amino acids

modification of this code is done here but error is showing only in counting "aromatic residue"

any suggestions

#!/usr/bin/env perl use 5.010; use strict; use warnings; use autodie; my $header = ''; my (@headers_found, %header_data); open my $fasta_fh, '<', $ARGV[0]; while (<$fasta_fh>) { chomp; next if /^\s*($|#)/; if (/^>(.*)$/) { push @headers_found, ($header = $1); } else { die 'Sequence data found without a header!' unless $header; for (split '', "\U$_") { /(?<a>[BDEZ])|(?<b>[KRH])|(?<ali>[AVLI])|(?<aro>[FHYW])|(?< +po>[DEHKNQRSTZ])|(?<nonpo>[ACFGILMPVWY])/; if ($_ =~ /(?<u>[XUGJOP])/){ next;} ++$header_data{$header}{(keys %+)[0]}; } } } close $fasta_fh; for (@headers_found) { say "Header: $_"; say "\tAcidic: $header_data{$_}{a}"; say "\tBasic: $header_data{$_}{b}"; say "\tAliphatic: $header_data{$_}{ali}"; say "\tAromatic: $header_data{$_}{aro}"; say "\tPolar: $header_data{$_}{po}"; say "\tNonpolar: $header_data{$_}{nonpo}"; # say "\tUnknown: $header_data{$_}{u}"; }

Replies are listed 'Best First'.
Re^13: calculation of charged amino acids
by mtmcc (Hermit) on Jul 30, 2013 at 12:44 UTC
    I haven't really done anything to it, but it works for me.

    Using this fasta file:

    >DROTME_HH_Q02936 
    MRHIAHTQRCLSRLTSLVALLLIVLPMVFSPAHSCGPGRGLGRHRARNLY
    PLVLKQTIPNLSEYTNSASGPLEGVIRRDSPKFKDLVPNYNRDILFRDE
    >DROME_HH_Q02937 
    MRHIAHTQRCLSRLTSLVALLLIVLPMVFSPAHSCGPGRGLGRHRARNLY
    PLVLKQTIPNLSEYTNSASGPLEGVIRRDSPKFKDLVPNYNRDILFRDEE
    GTGADRLMSKRCKEKLNVLAYSVMNEWPGIRLLTTTTTTESWDEDYHHGQ
    YEGRAVTIATSDRDQSKYGMLARLAVEAGFDWVSYVSRRHIYCSVKSDSS
    ESLH
    >DROME_HH_Q02938 
    GTGADRLMSKRCKEKLNVLAYSVMNEWPGIRLLTTTTTTESWDEDYHHGQ
    YEGRAVTIATSDRDQSKYGMLARLAVEAGFDWVSYVSRRHIYCSVKSDSS
    ESLH
    

    And this code (essentially the same as you've posted):

    #!/usr/bin/perl use strict; use warnings; use autodie; my $header = ''; my (@headers_found, %header_data); open my $fasta_fh, '<', $ARGV[0]; while (<$fasta_fh>) { chomp; next if /^\s*($|#)/; if (/^>(.*)$/) { push @headers_found, ($header = $1); } else { die 'Sequence data found without a header!' unless $he +ader; for (split '', "\U$_") { /(?<a>[BDEZ])|(?<b>[KRH])|(?<ali>[AVLI])|(?<aro +>[FHYW])|(?<po>[DEHKNQRSTZ])|(?<nonpo>[ACFGILMPVWY])/; if ($_ =~ /(?<u>[XUGJOP])/){ next;} ++$header_data{$header}{(keys %+)[0]}; } } } close $fasta_fh; for (@headers_found) { print STDERR "\n\nHeader: $_"; print STDERR "\nAcidic: $header_data{$_}{a}"; print STDERR "\nBasic: $header_data{$_}{b}"; print STDERR "\nAliphatic: $header_data{$_}{ali}"; print STDERR "\nAromatic: $header_data{$_}{aro}"; print STDERR "\nPolar: $header_data{$_}{po}"; print STDERR "\nNonpolar: $header_data{$_}{nonpo}"; print STDERR "\nUnknown: $header_data{$_}{u}" if defined $head +er_data{$_}{u}; }

    I get this output:

    Header: DROTME_HH_Q02936 
    Acidic:  7
    Basic:   18
    Aliphatic: 31
    Aromatic: 6
    Polar: 19
    Nonpolar: 4
    
    Header: DROME_HH_Q02937 
    Acidic:  22
    Basic:   35
    Aliphatic: 56
    Aromatic: 16
    Polar: 44
    Nonpolar: 9
    
    Header: DROME_HH_Q02938 
    Acidic:  14
    Basic:   17
    Aliphatic: 25
    Aromatic: 10
    Polar: 25
    Nonpolar: 5
    

    If you're not getting that result, then post the error message that you get.

      This works fine now. However it is not giving any output file as required.

       perl count_aa.pl <filename> >> <output file>

      this is the command i used in console screen. If i want to get output in a file, it is not working

        this is the command i used in console screen

        Try perl count_aa.pl infile >> outfile

      This is the sample fasta file for which program gives wrong calculation. I have manually counted the residues which does not correspond to program output
      for Aromatic residues:
      A 37
      V 10
      L 12
      I 6
      Total= 65 for polar residues:
      D 12
      E 7
      H 2
      K 12
      N 3
      Q 4
      R 20
      S 7
      T 4
      Z 0
      Total= 71
      for nonpolar residues:
      A 37
      C 3
      F 2
      G 17
      I 6
      L 12
      M 3
      P 13
      V 10
      W 3
      Y 1
      Total= 107

      if you run the program using this sequence you will find the error.

        You didn't post a fasta file, but in any case, the matching is behaving exactly as it's written.

        It won't match all the residues to all of the groups, it will only match each residue to the first group that appears in your regex. If that doesn't make sense, have a look at this.

        Your posts are becoming silly. You're reintroducing problems you've previously been shown how to fix. Please stop ignoring what people are telling you, please make notes and learn from your mistakes.

        You seem to be relying on others writing code for you. This isn't really a wise strategy. Most people are happy to help you learn and to point out mistakes and suggest improvements, depending on the good will of others to do your job/tasks isn't fair. What will you do when people get tired of doing things for you? Review the links you've previously been given and learn the basics of the tool you have chosen to use.