in reply to Re^2: Use of uninitialized value in string eq
in thread Use of uninitialized value in string eq

Oops, my bad. I should have used the hash slice correctly.

Changing  push (@info{$gi}, $line) to  push (@{$info{$gi}} solved the problem.

So I am left with the error involving @VariantList. Probably I am making another fundamental mistake when assigning, e.g., trying to assign an array into a scalar. I am looking into it.

Replies are listed 'Best First'.
Re^4: Use of uninitialized value in string eq
by sophix (Sexton) on Apr 23, 2010 at 16:24 UTC
    Victory! I made the code work properly. Nevertheless, I do probably have efficiency problems. I would appreciate if you can comment on points where I can improve my code (in terms of both performance and appropriate style). Thanks guys, I could not do it without your help!

    Here is the final ugly code. #!/usr/bin/perl -w use strict; use Data::Dumper; my %info = (); my ($gi, $humangi, $accession); my $data = '/DATA/proteinfile.txt'; open INFILE, '<', $data or die "Failed at opening $data!\n"; # Construct the hash with GIs as keys and sequences as values while ( <INFILE> ) { my $line = $_; chomp($line); last if m!END!; if($line=~m/HUMAN/){ ($humangi) = ($line=~m/^\S+\|(\d+)/); ($accession) = ($line=~m/^\S+\|\d+\|\w+\|(\S{6}?)/); } if($line=~m/^\S+\|(\d+)/) { if(defined($1)) { $gi=$1; } } else { $info{$gi} = $line; } } #print Dumper (\%info); print "$humangi\n"; print "$accession\n"; close(INFILE); my $data2 = '/DATA/variantlist.txt'; open INFILE2, '<', $data2 or die "Failed at opening $data2!\n"; my $data3 = '/DATA/VariantOutput.txt'; open OUTFILE, '>', $data3 or die "Failed at opening $data3!\n"; print OUTFILE "This is [GI: $humangi] and [Accession: $accession]\nVAR +IANT\t\tPOTENTIAL\t\tPD\n"; while ( <INFILE2>){ # Grab a variant from the file (in this example: P82L) my $line2 = $_; chomp($line2); my $Variant = $line2; # Split the variant into three parts my ($source, $position, $sink) = split(/(\d+)(\w)/, $Variant); #print "$source\t$position\t$sink\n"; # Check whether HS has the source (i.e., P) at the given position (i.e +., 82) my $temp = $info{$humangi}; #print "Temp contains $temp" . "\n"; my @char = split //, $temp; #print "Now \@char contains: @char"; #print "Inside the temp: $char[0] and $char[1]\n"; my $target = $char[$position-1]; #print "This is the target: $target" . "\n"; if ( $target eq $source) { print "Yep!\n"; } my @VariantList = (); my @PDList = (); # Scan the rest of the sequences to check what amino acid they have at + the given position for my $gi ( keys %info ) { my $value = $info{$gi}; my @char2 = split //, $value; my $potential = $char2[$position-1]; push (@VariantList, $potential); if ($potential eq $sink){ # Note the cases where we observe th +e sink (i.e., L) at this position my $pd = "$potential" . "{" . "$gi" . "}"; push (@PDList, $pd) #print "A pathogenic deviation has been found at site $pos +ition - from $source to $sink !\n" . " And the corresponding gi for t +his deviation is: $gi\n"; } } print OUTFILE "$Variant\t\t@VariantList\t\t@PDList\n"; } close(INFILE2);

      Instead of -w use warnings. The command line flag applies to any modules you include as well as your own code so you may get warnings from code you essentially have no control over using -w.

      Use the special variable $! in file i/o error handling messages to give a little more information about the nature of the failure.

      You don't need to set arrays or hashes empty when you declare them. They are sold without batteries.

      Don't declare variables in a lump: my ($foo, $baa, $baz). It makes it harder to see where they are declared and precludes providing a usage comment (although that shouldn't often be required).

      It's not clear from the input loop whether you expect more than one gi. Your use of $accession implies that there should only be one (else the accession value isn't related to the gi value even though they come from the same line of data). You still haven't addressed the possibility that you get a well formatted sequence line before you get a gi line. These are related issues.

      Use statement modifiers where you have a trivial statement controlled by a condition. For example $gi = $1 if defined $1;.

      Avoid a proliferation of temporary variables and assignments. For example, see the change I made to your second while loop.

      Avoid comments that say the same thing as the code. 'split into three parts' adds no extra information that a trivial inspection of the code doesn't tell you. The data format related comments ('... P82L ...') however are good!

      You can use {} to ensure a variable is interpolated correctly in a string. For example you can write "${potential}{$gi}" instead of concatenating a bunch of substrings together.

      My reworked version with the changes implied is:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '/DATA/proteinfile.txt'; open my $inFile, '<', $data or die "Failed at opening $data: $!\n"; # Populate the info hash with GIs as keys and sequences as values my $humanGi; my $accession; my $gi; # Current gi while reading sequences my %info; while (<$inFile>) { my $line = $_; chomp $line; last if m!END!; if ($line =~ m/HUMAN/) { ($humanGi, $accession) = $line =~ m/^\S+\|(\d+)\|\w+\|(\S{6}?) +/; } if ($line =~ m/^\S+\|(\d+)/) { $gi = $1 if defined $1; } else { $info{$gi} = $line; } } close $inFile; my $data2 = '/DATA/variantlist.txt'; open $inFile, '<', $data2 or die "Failed at opening $data2: $!\n"; my $data3 = '/DATA/VariantOutput.txt'; open my $outfile, '>', $data3 or die "Failed at opening $data3: $!\n"; print $outFile "This is [GI: $humanGi] and [Accession: $accession]\nVARIANT\t\tPO +TENTIAL\t\tPD\n"; while (defined (my $Variant = <$inFile>)) { # Grab a variant from the file (in this example: P82L) chomp $Variant; my ($source, $position, $sink) = split /(\d+)(\w)/, $Variant; # Check whether HS has the source (i.e., P) at the given position +(i.e., 82) my @char = split //, $info{$humanGi}; my $target = $char[$position - 1]; my @VariantList; my @PDList; # Scan the rest of the sequences to check what amino acid they hav +e at # the given position for my $gi (keys %info) { my @char2 = split //, $info{$gi}; my $potential = $char2[$position - 1]; push @VariantList, $potential; if ($potential eq $sink) { # Note the cases where we observe the sink (i.e., L) at th +is position push @PDList, "${potential}{$gi}"; } } print $outFile "$Variant\t\t@VariantList\t\t@PDList\n"; } close $inFile; close $outFile;

      This is still rather unsatisfactory code because there are many ways it can fail due to unexpected data. Sanity checking helps ensure the code and the data format conform to the same expectations and make it much easier to diagnose problems when expectations aren't met.

      There are no obvious inefficiencies in the current code. Initially efficiency shouldn't be an major consideration in any case. Generally if you avoid nested loops to the extent reasonable and avoid re-reading input files code of this sort will perform well enough for most purposes.

      True laziness is hard work
        Thank you very much, GrandFather. I'll pay close attention to all points you mentioned.

        Redundant comments are result of my code writing process. I first write down the comments to have them as instructions to myself when implementing the algorithm. I should have removed or modified them as such they are used appropriately.

        Thank you once again. I have just begun writing small scripts to help me with my research. And I do enjoy a lot writing codes. I hope to improve my programming skills for the sake of both increasing productivity and fun! :)

        Ah, ah, I should not have shouted "victory" so early! That was only the battle, I seem to have lost the war here.

        The code works for some input and not for the rest! I have manipulated the input file to have an insight into what might cause the problem, yet I have nothing. I'll post it to here hoping that someone might have an idea.

        Essentially, the problem is that the script takes the first 22 input items (of two lines each) and ignores the rest. All these items have the same format. I could not figure out why there is such limitation on the input while there is no difference at all between various input items.

        I hope the files do not violate any space limitation (if there is one).

        The code -- revised version by GrandFather. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data = '/DATA/alignment.fas'; open my $inFile, '<', $data or die "Failed at opening $data: $!\n"; # Populate the info hash with GIs as keys and sequences as values my $humanGi; my $accession; my $gi; # Current gi while reading sequences my %info; while (<$inFile>) { my $line = $_; chomp $line; last if m!END!; if ($line =~ m/(HUMAN|Homo)/) { ($humanGi, $accession) = $line =~ m/^\S+\|(\d+)\|\w+\|(\S{6}?) +/; } if ($line =~ m/^\S+\|(\d+)/) { $gi = $1 if defined $1; } else { $info{$gi} = $line; } } print Dumper (\%info); close $inFile; my $data2 = '/DATA/variantList.txt'; open $inFile, '<', $data2 or die "Failed at opening $data2: $!\n"; my $data3 = '/DATA/pathogenList.txt'; open my $outFile, '>', $data3 or die "Failed at opening $data3: $!\n"; print $outFile "This is [GI: $humanGi] and [Accession: $accession]\nVARIANT\t\tPO +TENTIAL\t\tPD\n"; while (defined (my $Variant = <$inFile>)) { # Grab a variant from the file (in this example: P82L) chomp $Variant; my ($source, $position, $sink) = split /(\d+)(\w)/, $Variant; # Check whether HS has the source (i.e., P) at the given position +(i.e., 82) #my @char = split //, $info{$humanGi}; #my $target = $char[$position - 1]; my @VariantList; my @PDList; # Scan the rest of the sequences to check what amino acid they hav +e at # the given position foreach my $gi (keys %info) { my @char2 = split //, $info{$gi}; my $potential = $char2[$position - 1]; push @VariantList, "${potential}{$gi}"; if ($potential eq $sink) { # Note the cases where we observe the sink (i.e., L) at th +is position push @PDList, "${potential}{$gi}"; } } print $outFile "$Variant\t@VariantList\t@PDList\n"; } close $inFile; close $outFile;

        First input. Variant list. A5V A5S A5T C7F V8E L9Q L9V G13R V15G V15M G17S F21C E22K E22G Q23L G38R L39R L39V G42D G42S H44R F46C H47R H49R H49Q E50K T55R N66S L68R G73S D77Y H81A L85F L85V G86R N87S V88A A90T A90V D91A D91V G94A G94C G94D G94R G94V V98M E101G E101K D102N D102G I105F S106L L107V G109V I113M I113T I114T G115A R116G V119L D125V D125G D126H L127S S135N N140K L145F L145S A146T C147R G148R V149G V149I I150T I152T

        and the second input file. >gi|134611|sp|P00441.2|_Homo_sapiens MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSR +KHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTG +NAGSRLACGVIGIAQ >gi|112419222|Xenopus_laevis AMVKAVCVLAGSGDVKGVVRFEQQDD-GDVTVEGKIEGLTDGNHGFHIHVFGDNTNGCLSAGPHFNPQNK +NHGSPKDADRHVGDLGNVTA-EGGVAQFKFTDPQISLKGERSIIGRTAVVHEKQDDLGKGGDDESLKTG +NAGGRLACGVIGFCP >gi|62858937|_Xenopus_(Silurana)_tropi... -MVRAVCVLAGSGDVKGVVHFQQQDE-GPVTVEGKIYGLTDGKHGFHIHEFGDNTNGCISAGPHFNPESK +THGAPEDAVRHVGDLGNVTA-KDGVAEFKLTDSLISLKGNHSIIGRCAVVHEKEDDLGKGGNDESLKTG +NAGGRLACGVIGLCQ >gi|226372562|_Rana_catesbeiana --MKAICVLKGSSEVTGVVRFEQEED-GPVTVTGQITGLTDGKHGFHIHTYGDNTDGCVSAGPHFNPQGK +THGGPDDEVRHVGDLGNVTS-AGGVADINIKDKLISLKGEHSIIGRTAVVHEKEDDLGKGGDNESLITG +NAGGRLACGVIGICQ >gi|116048074|_Scyliorhinus_torazame --MKAICVLKGTGEVTGTVQFDQAGG-GPVTVKGSITGLTPGKHGFHVHAFGDNTNGCISAGPHYNPFLK +THGGPGDEERHVGDLGNVEANGDGVATFEIQDNQLHLSGERSIIGRTLVVHEKEDDLGKGEDEESTRTG +NAGSRLACGVIGIAK >gi|216963348|_Ctenopharyngodon_idella -------------------YFEQEGEKSPVTLSGEITGLTAGKHGFHVHAFGDNTNGCISAGPHFNPYSK +NHGGPTDSERHVGDLGNVIAGENGVAKIDIVDKMLTLSGPDSIIGRTMVIHEKEDDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|226232347|_Pimephales_promelas ---------------------------------------------------------------HFNPHTQ +NHGGPTDSARHVGDLGNVTAGENGVAKIDIVDKMLTLSGQHSIIGRTMVIHEKEDDLGKGGNE------ +--------------- >gi|238801237|_Hemibarbus_mylodon MAKKAVCVLKGTGEVTGTVFFEQETDGSPVKLSGTISGLTAGKHGFHVHVFGDNTNGCISAGPHFNPHNK +NHGGPTDGDRHVGDLGNVTAGESGVAKIDIVDKMLTLSGQHSIIGRTMVIHEKEDDLGKGGNEESLKTG +NAGGRLACGVIGITG >gi|47227092|_Tetraodon_nigroviridis MVIKAVCVLKGAGETSGTVYFEQQDEKAPVKLTGEIKGLTAGEHGFHVHAFGDNTNGCISAGPHYNPHNK +THAGPNDENRHVGDLGNVTAEADQIAKIDITDSVISLHGKFSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|225706520|_Osmerus_mordax MVLKAVCVLKGTGEVTGTVFFEQEGDNGPVKLTGEISGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHSK +THGGPTDDVRHVGDLGNVTAGQDNVAKISIQDKHLTLNGVHSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|185132317|_Oncorhynchus_mykiss MAMKAVCVLKGTGEVTGTVFFEQEGADGPVKLIGEISGLAPGEHGFHVHAYGDNTNGCMSAGPHFNPHNQ +THGGPTDAVRHVGDLGNVTAGADNVAKINIQDKMLTLTGPDSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRQACGVIGIAQ >gi|56790262|_Danio_rerio MVNKAVCVLKGTGEVTGTVYFNQEGEKKPVKVTGEITGLTPGKHGFHVHAFGDNTNGCISAGPHFNPHDK +THGGPTDSVRHVGDLGNVTADASGVAKIEIEDAMLTLSGQHSIIGRTMVIHEKEDDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|185135289|_Salmo_salar MALKAVCVLKGTGEVTGTVFFEQEGDGAPVKLTGEIAGLTPGEHGFHVHAFGDNTNGCMSAGPHFNPHNH +THGGPTDTVRHVGDLGNVTAAADSVAKINIQDEILSLAGPHSIIGRTMVIHEKADDLGKGDNEESRKTG +NAGSRLACGVIGIAQ >gi|134284932|_Carassius_auratus ---------------------------------------------FHVHAFGDNTNGCTSAGPHYNPHNQ +THGGPTDSVRHVGDLGNV--------------------------------------------------- +--------------- >gi|110180503|_Oryzias_javanicus ----------------------------------------PGEHGFHVHAFGDNTNGCISAGPHFNPYGK +DHAGPTDEHRHVGDLGNVTANAENVAKLDFTDKVITLAGPHSIIGRTMVIHEKKDDLGKGGNEESLKTG +NA------------- >gi|229365862|_Anoplopoma_fimbria MVVKAVCVLKGAGETSGVVHFEQEGDTAAVKLTGEIIGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHNN +THAGPTDEQRHVGDLGNVTAGGDNIAKIDITDKIITLTGQHSIIGRTMVIHEKADDLGKGGNDESLKTG +NAGARLACGVIGIAQ >gi|226934254|_Dicentrarchus_labrax ---------------------------------------------------------------------- +---------RHVGDLGDVTAGGDNIAKIDITDKMLTLTGPLFIIGRTMVIHEKADDLGKGGNEESLKTG +--------------- >gi|54873355|_Sebastes_schlegelii ---------------------------------GEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHGK +DHAGPTDQERHVGDLGNVTAGAANVAKIDITDKMLTLTGPLSIIRRTMVIHEKKDDLGKGGNEESLKTG +NAGG----------- >gi|62550923|_Sparus_aurata -------------------------------------------------------------------HGK +NHGGPTDAERHVGDLGNVTAGADNVAKIDITDKMLTLSGPLSIIGRTMVIHEKVDDLGKGGNEE----- +--------------- >gi|27462182|_Pagrus_major MVQKAVCVLKGAGETTGVVHFEQESESAPVTLKGEISGLTPDEHGFHVHAFGDNTNGCISAGPHFNPHNK +NHAGPTDAERHVGDLGNVTAGADNVAKIDITDKMLTLNGPFSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGICQ >gi|12733941|_Platichthys_flesus -----------------------------------IAGLAPGEHGFHVHSFGDNTNGCMSAGPHFNPHGK +NHAGPTDADRHVGDLGNVTAGADNVAEINISDKMLTLNGPNSIIGRTMVIHEKADDLGKGGNDESLKTG +NA------------- >gi|151549024|Paralichthys_olivaceus ------------------------------------------EHGFHVHAFGDNTNGCISAGPHFNPHGK +NHAGPTDAERHVGDLGNVTAGKDNVAEINISDKIITLFGAHSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGARLACGVIG--- >gi|57908848|_Trematomus_bernacchii ---KAVCVFKGTGEASGTVFFEQENDSAPVKLTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHNK +THAGPTDEDRHVGDLGNVTAAADNVAKLNITDKMITLAGQYSIIGRTMVIHEKADDLGKGGNDESLKTG +NAGGRLACGVIGIAQ >gi|57908852|Chionodraco_hamatus ---KAVCVFKGAGEASGTVFFEQETDSCPVKLTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHNK +THAGPTDENRHVGDLGNVTAAADNVAKLDITDKMITLAGQYSIIGRTMVIHEKADDLGKGGNDESLKTG +NAGGRLACGVIGIAQ >gi|157152709|_Takifugu_obscurus MAMKAVCVLKGAGDTSGTVYFEQENESAPVKLTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHYNPHNK +THAGPTDADRHVGDLGNVTAGADNIAKIDIKDSMLTLTGPYSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|67772081|_Siniperca_chuatsi --------------------------------------FTPGEHGSHVHVFGDNTNGCISAGPHYNPHGK +NHAGPNDAERHVGDLGNVTAGADNVAKIDITDKMPSLTGPYSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|40218091|_Oreochromis_mossambicus MVLKAVCVLKGTGDTSGTVYFEQENDSAPVKLTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPYNK +NHGGPKDAERHVGDLGNVTAGADNVAKIEITDKVITLTGRDSIIGRTMVIHEKVDDLXKGGNEESLKTG +NAGGRLACGVIGITQ >gi|37542151|_Epinephelus_malabaricus MVLKAVCVLKGAGETSGTVYFEQETDSAPVKLTGEIKGLTPGEHGFQVHAFGDNTNGCISAGPHFNPHNK +HHAGPTDAERHVGDLGNVTAGGDNVAKIDITDKIITLNGPYSIIGRTMVIHEKADDLGTGGNEESLKTG +NAGGRLACGVIGISQ >gi|56785775|Epinephelus_coioides MDLKAVCVLKGAGETSGTVYFEQESDSAPVKLTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHNK +QHAGPTDADRHVGDLGNVTAGGDNVAKIDITDKMLTLNGPYSIIGRTMVIHEKADDLGRGGNDESLKTG +NAGGRLACGVIGIAQ >gi|47607437|_Oplegnathus_fasciatus MVLKAVCVLKGAGETTGTVYFEQESDSAPVKLTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHNK +NHAGPNDAERHVGDLGNVTAGADNVAKIDIKDHIITLTGPDSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGITQ >gi|115392225|_Rachycentron_canadum MVLKAVCVLKGAGETTGTVYFEQESDSAPVKVTGEIKGLTPGEHGFHVHAFGDNTNGCISAGPHFNPHNK +NHAGPNDEERHIGDLGNVTAGADNVAKVDITDKMLTLNGPYSIIGRTMVIHEKADDLGKGGNEESLKTG +NAGGRLACGVIGIAQ >gi|224044145|_Taeniopygia_guttata AAMRAVCVMQGEGAVKGVIHFEQQGT-GPVKVTGEITGLADGEHGFHVHEFGDNTNGCTSAGPHFNPEQK +KHGGPSDAERHVGDLGNVTA-KGGVAQVSIQDSVISLSGPHCIIGRTMVVHERRDDLGRGGNDESLLTG +NAGPRLACGVIGIAK >gi|45384218|_Gallus_gallus ATLKAVCVMKGDAPVEGVIHFQQQGS-GPVKVTGKITGLSDGDHGFHVHEFGDNTNGCTSAGAHFNPEGK +QHGGPKDADRHVGDLGNVTA-KGGVAEVEIEDSVISLTGPHCIIGRTMVVHAKSDDLGRGGDNESKLTG +NAGPRLACGVIGIAK >gi|29373121|_Melopsittacus_undulatus ATLKAVCVMKGEGPVQGVIHFQQQGN-GPVKVTGKISGLADGDHGFHVHEFGDNTNGCTSAGPHFNPEGK +QHGGPSDAERHVGDLGNVTA-KGGVAEVAIEDSIISLSGPHSIVGRTMVVHEKCDDLGRGGDNESKLTG +NAGPRLACGVIGIAK >gi|89515076|_Bufo_gargarizans -MVKAICVLKGNGPVHGIVGFNQDG--GEVTVKGTINGLTDGLHGFHIHVYGDNTNGCMSAGPHFNPHGK +SHGAPEDEERHVGDLGNITS-KDGVAEFEFKDKIISLEGEHNIIGRTAVVHEKADDLGKGGDNESKVTG +NAGGRLACGVIGICQ >gi|226844835|_Trachemys_scripta_elegans ---------------------------------------------------------CTSAGAHFNPNGK +NHGGPQDKERHVGDLGNVIANKDGVAEVSIKDSLISLTGPLSIIGRTMVVHEKEDDLGKGNN------- +--------------- >gi|265797|_Caretta_caretta ---------------------------ATVKAVCVLKGEDPVKEPVKGPVKEPVKGIIYFEQQGN-GPVT +LSGSITGLTEGKHGFHVHEFGDNTNGCTSAGAHFNPPGKNHGGPQDNERHVGDLGNVIANKEGVAEVCI +KDSLISLTGSQSIIG >gi|126352669|_Equus_caballus MALKAVCVLKGDGPVHGVIHFEQQQEGGPVVLKGFIEGLTKGDHGFHVHEFGDNTQGCTTAGAHFNPLSK +KHGGPKDEERHVGDLGNVTADENGKADVDMKDSVISLSGKHSIIGRTMVVHEKQDDLGKGGNEESTKTG +NAGSRLACGVIGIAP >gi|126325231|_Monodelphis_domestica MVLKAVCVLKGDGPVQGTIFFEQKQVGEPVELSGSIKGLAEGDHGFHVHEFGDNTQGCTSAGAHFNPHSK +KHGGPTDEERHVGDLGNVTANKDGVATVSIKDSHIELSGPMSIIGRTMVVHEKADDLGKGGNAESEKTG +NAGPRLACGVIGIAK >gi|130497065|_Oryctolagus_cuniculus MATKAVCVLKGDGPVEATIHFEQKGT-GPVVVKGRITGLTEGLHEFHVHQFGDNRQGCTSAGPHFNPLSK +KHGGPKDEERHVGDLGNVTAGSNGVADVLIEDSVISLSGDMSVIGRTLVVHEKEDDLGKGGNDESTKTG +NAGSRLACGVIGISP >gi|74136167|_Macaca_mulatta MAMKAVCVLKGDSPVQGTINFEQKESNGPVKVWGSITGLTEGLHGFHVHQFGDNTQGCTSAGPHFNPLSR +QHGGPKDEERHVGDLGNVTAGKDGVAKVSFEDSVISLSGDHSIIGRTLVVHEKADDLGKGGNEESKKTG +NAGGRLACGVIGIAQ >gi|84579183|_Macaca_fascicularis MAMKAVCVLKGDSPVQGTINFEQKESNGPVKVWGSITGLTEGLHGYHVHQFGDNTQGCTSAGPHFNPLSR +QHGGPKDEERHVGDLGNVTAGKDGVAKVSFEDSVISLSGDHSIIGRTLVVHEKADDLGKGGNEESKKTG +NAGGRLACGVIGIAH >gi|197102620|_Pongo_abelii MATKAVCVLKGDSPVKGIINFEQKERNGPVKVWGSIEGLTEGLHGFHVHEFGDNTVGCTSAGPHFNPLSR +KHGGPKDEERHVGDLGNVTADKDGVVSVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTG +NAGSRLACGVIGIAQ >gi|223633904|_Ovis_aries MATKAVCVLKGDGPVQGTIRFEAKGD--KVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSK +KHGGPKDEERHVGDLGNVKADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHERPDDLGRGGNEESTKTG +NAGGRLACGVIGIAP >gi|194672519|_Bos_taurus MATKAVCVLKGDGPVQGTIHFEAKGN--TVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSK +KHSGPKDEERHVGDLGNVTADKNGVAVVDIVDSLISLSGEYSIIGRTMVVHEKPDDLGRGGNEESTKTG +NAGSRLACGVIGIAK >gi|2660692|_Cervus_elaphus MATKAVCVMKGDGPVQGTIRFEAKGN--TVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGPHFNPLSK +KHGGPKDEERHVGDLGNVTADKNGVAKVDIVDSLISLSGEHSIIGRTMVVHEKPDDLGRGGNEESTKTG +NARNRLACGVIGIAQ >gi|39578718|_Cavia_porcellus -ATKAVCVLKGDGPVQGIIHFEQKAN-GPVVVKGRITGLVEGKHGFHVHEFGDNTQGCTSAGPHFNPLSK +KHGGPQDEERHVGDLGNVTAGADGVANVSIEDSLISLSGANSIIGRTMVVHEKPDDLGKGGNEESTKTG +NAGSRLACGVIGIAQ >gi|15082144|_Sus_scrofa ---KAVCVLKGDGPVQGTIYFELKGE-KTVLVTGTIKGLAEGDHGFHVHQFGDNTQGCTSAGPHFNPESK +KHGGPKDQERHVGDLGNVTAGKDGVATVYIEDSVIALSGDHSIIGRTMVVHEKPDDLGRGGNEESTKTG +NAGSRLACGVIG--- >gi|281348263|_Ailuropoda_melanoleuca --------------------------------------------------------GCTSAGPHFNPLSK +KHGGPKDEERHVGDLGNVTAGKDGVATVSLEDSLIALSGDHSIIGRTMVVHEKRDDLGKGGNEESTQTG +NAGSRLACGVIGIAK >gi|8394328|_Rattus_norvegicus MAMKAVCVLKGDGPVQGVIHFEQKASGEPVVVSGQITGLTEGEHGFHVHQYGDNTQGCTTAGPHFNPHSK +KHGGPADEERHVGDLGNVAAGKDGVANVSIEDRVISLSGEHSIIGRTMVVHEKQDDLGKGGNEESTKTG +NAGSRLACGVIGIAQ >gi|45597447|_Mus_musculus MAMKAVCVLKGDGPVQGTIHFEQKASGEPVVLSGQITGLTEGQHGFHVHQYGDNTQGCTSAGPHFNPHSK +KHGGPADEERHVGDLGNVTAGKDGVANVSIEDRVISLSGEHSIIGRTMVVHEKQDDLGKGGNEESTKTG +NAGSRLACGVIGIAQ >gi|55925004|_Mus_spretus ------------------------------------------------HQYGDNTQGCTSAGPHFNPHS- +--------------------------------------------------------------------- +--------------- END