Re: [BioPerl] add_seq gives warning: why?

There were a few issues with your posted code, but in simple script form this gave the same result:

use strict;
use warnings;

++$|; ## buffering off

my %align;
my $count = 0;
my %c2name;

use Bio::SimpleAlign;

my $self = Bio::SimpleAlign->new();


print "---> Reading data\n";
while( <DATA> ) {
    /^([^\#]\S+)\s+([A-Za-z\.\-]+)\s*/ && do {      
        my $name = $1;
        my $seq = $2;
        if( ! defined $align{$name}  ) {
            $count++;
            $c2name{$count} = $name;
        }
        $align{$name} .= $seq;
        print "Count >$count< - Adding Name >$name<\n\tSeq >$seq<\n";
    };
    }
    
    print "---> Forming alignment\n";
    
    $count = 0;
    foreach my $no ( sort { $a <=> $b } keys %c2name ) {
        my $name = $c2name{$no};
    my( $seqname, $start, $end, $strand );
        if( $name =~ /(\S+)\/(\d+)-(\d+)$/ ) {
            $seqname = $1; $start = $2; $end = $3;
        } elsif ( $name =~ /(\S+)\/(\d+)-(\d+):(\d+)-(\d+)/ ) {
        $seqname = $1; my $ns = $2; my $s = $3; my $e = $4; my $ne = $
+5;
            $start = "$ns-$s"; $end = "$e-$ne"; # surprise: this is le
+gal
        $strand = 1;
    } 
        
                ## make sure id is unique
        #$seqname .= 'x' while ( exists $align{id}{$seqname} );
        #++$align{id}{$seqname};

        print "Name >$name<\n\tID >$seqname<\n";
        my $seq = new Bio::LocatableSeq( '-seq'=>$align{$name}, '-id'=
+>$seqname,
        '-start'=>$start, '-end'=>$end, '-strand'=>$strand, '-type'=>'
+aligned' );
        $self -> add_seq($seq);
        $count++;
    }
    print "Count : $count\n";

__DATA__
hit1_EF374296.1_1-432/1-432              uauGGAAACWUACU
hit1_AM161438.1_1-497/20-516             gAGAAACCCUGGAA
hit1_AM161438.1_1-497/1-1:497-993        gGAAAAUCCGUCGA
hit1_EF374296.1_1-432/1-1:432-863        UGAAAAUCCGUCGA
hit1_EF374296.1_509-949/509-509:949-1389 GGAAAAUCCGUCGA
hit1_EF374296.1_509-949/938-1382         AUAGUAAGAGGAAA
hit1_EF374297.1_30-470/30-30:470-910     GGAAAAUCCGUCGA
[download]

which gave :

---> Reading data
Count >1< - Adding Name >hit1_EF374296.1_1-432/1-432<
    Seq >uauGGAAACWUACU<
Count >2< - Adding Name >hit1_AM161438.1_1-497/20-516<
    Seq >gAGAAACCCUGGAA<
Count >3< - Adding Name >hit1_AM161438.1_1-497/1-1:497-993<
    Seq >gGAAAAUCCGUCGA<
Count >4< - Adding Name >hit1_EF374296.1_1-432/1-1:432-863<
    Seq >UGAAAAUCCGUCGA<
Count >5< - Adding Name >hit1_EF374296.1_509-949/509-509:949-1389<
    Seq >GGAAAAUCCGUCGA<
Count >6< - Adding Name >hit1_EF374296.1_509-949/938-1382<
    Seq >AUAGUAAGAGGAAA<
Count >7< - Adding Name >hit1_EF374297.1_30-470/30-30:470-910<
    Seq >GGAAAAUCCGUCGA<
---> Forming alignment
Name >hit1_EF374296.1_1-432/1-432<
    ID >hit1_EF374296.1_1-432<
Name >hit1_AM161438.1_1-497/20-516<
    ID >hit1_AM161438.1_1-497<
Name >hit1_AM161438.1_1-497/1-1:497-993<
    ID >hit1_AM161438.1_1-497<
Name >hit1_EF374296.1_1-432/1-1:432-863<
    ID >hit1_EF374296.1_1-432<
Name >hit1_EF374296.1_509-949/509-509:949-1389<
    ID >hit1_EF374296.1_509-949<

-------------------- WARNING ---------------------
MSG: Replacing one sequence [hit1_EF374296.1_1-432/1-432]

---------------------------------------------------
Name >hit1_EF374296.1_509-949/938-1382<
    ID >hit1_EF374296.1_509-949<
Name >hit1_EF374297.1_30-470/30-30:470-910<
    ID >hit1_EF374297.1_30-470<
Count : 7
[download]

The problem i think was using $seqname as your object id, raher than the full (unique) id. Bio::SimpleAlign needs unique ids maybe?

Anyway, I added in the bit that made sure the ids are unique ( currently commented out in the above ), but still make sense to you(?), and it give the same as above, but the error is gone.

Maybe this is a little bit like just turning off warnings... but the problem does stem from your ids, not the code, so i think this is a reasonable workaround, which doesn't rely on users having to always provide unique ids...

Hope this helps?

Just a something something...

Comment on Re: [BioPerl] add_seq gives warning: why? Select or Download Code

Replies are listed 'Best First'.
Re^2: [BioPerl] add_seq gives warning: why? by BioNick (Initiate) on Jan 12, 2010 at 13:14 UTC
Your code makes sense: you add an x to a name if it already exists, making it unique. This may lead to names with endless rows of x's (in theory, in the real world there will not be much more than 2 or 3) but still unique because of them. It's a very simple workaround and I like it! In the end, I don't want the x's to turn up in my alignmentfiles but I can easily remove them later. Thanks!	[reply]
Re^3: [BioPerl] add_seq gives warning: why? by BioLion (Curate) on Jan 12, 2010 at 13:42 UTC
Probbaly a better workaround, might be to use the full name (including the position info) as the id, rather than what is currently happening. Presumably there can't be sequences with the same base name and position? This way you don't have to worry about taking x's off etc... Just a something something...	[reply]