Bforde has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I am new to perl so I apologise in advance for the crudeness of my question and of my code.

I am writing a script mainly in BioPerl which will allow me to edit a genbank file. The script reads a tab delimited text file containing two columns which are separated into two list V1 and V2. The script then parses a genbank file (an annotated DNA file) and extracts a third list (V3). Each element in V1 is then to be compared against V3 and where they are equal do something. the problem lies in that the script does not seem to cycle through each element of V1. Instead only the results for the final element of V1 is displayed. I have tried a number of different methods to solve this problem but I always get the same results. Below is a sample of the original code

The input file 'list' has the format

HP17_05860 HP_01111,

HP17_05865 HP_01112

use Bio::SeqIO; open (LIST1, "list"); while (<LIST1>){ ($V1, $V2) = split(/\t/, $_); } close(LIST1); my $seqIO_object = Bio::SeqIO->new(-file=>"infile.gb"); my $seq_object = $seqio_object ->next_seq; for my $feat_object ($seq_object->get_SeqFeatures){ if ($feat_object->primary_tag eq "CDS"){ if ($feat_object->has_tag('locus_tag')){ for my $V3 ($feat_object ->get_tag_values('locus_tag')){ if ($V1 eq $V3){ print "locus_tag: ", $V1, " is unique\n"; } } } } }

Each element in V1 is contained in V3 so with the input above the out put should read

locus_tag: HP17_05860 is unique

locus_tag: HP17_05865 is unique

However the output I get is

locus_tag: HP17_05865 is unique

this problem is probably easily solved with the appropriate use of arrays and loops

regards

Brian

Replies are listed 'Best First'.
Re: Question on loops
by tobyink (Canon) on Dec 12, 2011 at 14:09 UTC

    Each iteration of your while loop overwrites the variables $V1 and $V2, so by the time your while loop is complete, only the final line's values are there.

    Something like this (untested) should work:

    use Bio::SeqIO; my (%all_V1, %all_V2); # hashes to store $V1 and $V2 open (LIST1, "list"); while (<LIST1>) { ($V1, $V2) = split /\t/; $all_V1{$V1}++; # FIX: this line and the next were originally $all_V2{$V2}++; # incorrectly prefixed with "push". } close(LIST1); my $seqIO_object = Bio::SeqIO->new(-file=>"infile.gb"); my $seq_object = $seqio_object ->next_seq; for my $feat_object ($seq_object->get_SeqFeatures){ if ($feat_object->primary_tag eq "CDS"){ if ($feat_object->has_tag('locus_tag')){ for my $V3 ($feat_object ->get_tag_values('locus_tag')){ if (exists $all_V1{$V3}){ print "locus_tag: ", $V3, " is unique\n"; } } } } }

    By the way, $V1, $V2 and $V3 are very badly named variables. If you want to be able to come back to your code in a few months and fix a bug, it really helps if your variables have useful names.

      Hello tobyink

      Thank you for the quick response. I figured that V1 and V2 were being over written had started to look at hashes. I understand what they are doing but not how to use them correctly

      the code does have an error

      Type of arg 1 to push must be array (not postincrement (++)) at test.pl line 10, near "++;"

      Type of arg 1 to push must be array (not postincrement (++)) at test.pl line 10, near "++;"

      must the V1 and V2 variables be changed to arrays before being added to the hash

      Brian

      I can assure you that my actual variable have appropriate names

        Meh. Remove the two pushes. Should just be:

        while (<LIST1>) { ($V1, $V2) = split /\t/; $all_V1{$V1}++; $all_V2{$V2}++; }

        ... the pushes were left overs from an earlier response I typed up but didn't submit, which used arrays.

        I'll correct the code in my original answer too.

Re: Question on loops
by TJPride (Pilgrim) on Dec 12, 2011 at 15:42 UTC
    Also untested, since I don't have Bio::SeqIO:

    use Bio::SeqIO; use strict; use warnings; my @V; open (LIST1, 'list') || die; while (<LIST1>){ push @V, (split(/\t/, $_))[0]; } close(LIST1); my $seqIO_object = Bio::SeqIO->new(-file => 'infile.gb'); my $seq_object = $seqIO_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { if ($feat_object->primary_tag eq 'CDS' && $feat_object->has_tag('locus_tag')) { for my $V3 ($feat_object->get_tag_values('locus_tag')) { for my $V1 (@V) { print "locus_tag $V1 is unique\n" if $V1 eq $V3; } } } }

    I could probably streamline this further if I knew exactly what you were trying to do, but oh well.

      Thanks all for answering. Book are being read and tutorial done

      TJPride, your code works if I change the last two lines

      ) for my $V1 (@V){ if ($V1 eq $V3){ print "locus_tag $V1 is unique\n"; } }
Re: Question on loops
by umasuresh (Hermit) on Dec 12, 2011 at 15:37 UTC