BioNick has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Same question as before, this time no longer as an anonymous monk.

My original question:

"At a certain point in my code the Bio::SimpleAlign add_seq method is called. This method will give a warning if (chunk of the add_seq code):

if( $self->{'_seq'}->{$name} ) { $self->warn("Replacing one sequence [ +$name]\n") unless $self->verbose < 0; }
What happens here?"

You kindly replied that the most likely thing was that there are two sequences with the same name. I checked this and True, there are more sequences with the same name. But only one of them involkes this warning!?!

Now what?

To clarify:

Sample input file:
hit1_AM161438.1_1-497/20-516 gAGAAACCCUGGAA hit1_AM161438.1_1-497/1-1:497-993 gGAAAAUCCGUCGA hit1_EF374296.1_1-432/1-432 uauGGAAACWUACU hit1_EF374296.1_1-432/1-1:432-863 UGAAAAUCCGUCGA hit1_EF374296.1_509-949/509-509:949-1389 GGAAAAUCCGUCGA hit1_EF374296.1_509-949/938-1382 AUAGUAAGAGGAAA hit1_EF374297.1_30-470/30-30:470-910 GGAAAAUCCGUCGA
Warning message:
-------------------- WARNING --------------------- MSG: Replacing one sequence [hit1_EF374296.1_1-432/1-432] ---------------------------------------------------
My script:
use strict; use warnings; use MyAlign; my $aln = new MyAlign; open ( ALIGN, "<inputAlignmentFile" ) or die "failed to open inputAlig +nmentFile"; $aln -> read_alignment ( \*ALIGN ); close ALIGN;
The package:

use Bio::SimpleAlign; sub new { my $caller = shift; my $class = ref( $caller ) || $caller; my $self = $class -> SUPER::new(); return $self; } sub read_alignment { my $self = shift; my $in = shift; my( %align, %c2name, $count ); while( <$in> ) { /^([^\#]\S+)\s+([A-Za-z\.\-]+)\s*/ && do { my $name = $1; my $seq = $2; if( ! defined $align{$name} ) { $count++; $c2name{$count} = $name; } $align{$name} .= $seq; next; }; } $count = 0; foreach my $no ( sort { $a <=> $b } keys %c2name ) { my $name = $c2name{$no}; my( $seqname, $start, $end, $strand ); if( $name =~ /(\S+)\/(\d+)-(\d+)$/ ) { $seqname = $1; $start = $2; $end = $3; } elsif ( $name =~ /(\S+)\/(\d+)-(\d+):(\d+)-(\d+)/ ) { $seqname = $1; my $ns = $2; my $s = $3; my $e = $4; my $ne = $ +5; $start = "$ns-$s"; $end = "$e-$ne"; # surprise: this is le +gal $strand = 1; } my $seq = new Bio::LocatableSeq( '-seq'=>$align{$name}, '-id'= +>$seqname, '-start'=>$start, '-end'=>$end, '-strand'=>$strand, '-type'=>' +aligned' ); $self -> add_seq($seq); $count++; } return $count; }

Replies are listed 'Best First'.
Re: [BioPerl] add_seq gives warning: why?
by BioLion (Curate) on Jan 12, 2010 at 12:06 UTC

    There were a few issues with your posted code, but in simple script form this gave the same result:

    The problem i think was using $seqname as your object id, raher than the full (unique) id. Bio::SimpleAlign needs unique ids maybe?

    Anyway, I added in the bit that made sure the ids are unique ( currently commented out in the above ), but still make sense to you(?), and it give the same as above, but the error is gone.

    Maybe this is a little bit like just turning off warnings... but the problem does stem from your ids, not the code, so i think this is a reasonable workaround, which doesn't rely on users having to always provide unique ids...

    Hope this helps?

    Just a something something...
      Your code makes sense: you add an x to a name if it already exists, making it unique. This may lead to names with endless rows of x's (in theory, in the real world there will not be much more than 2 or 3) but still unique because of them. It's a very simple workaround and I like it! In the end, I don't want the x's to turn up in my alignmentfiles but I can easily remove them later.

      Thanks!

        Probbaly a better workaround, might be to use the full name (including the position info) as the id, rather than what is currently happening. Presumably there can't be sequences with the same base name and position? This way you don't have to worry about taking x's off etc...

        Just a something something...