Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello , I have to align a set of small sequences (DNA sequence) over another large sequence. The large sequence is just a single sequence where I may have more than 5000 small sequences. I have to align these small sequence to the large sequence ( may be 10kb in size). I tried aligning with Bioperl the standAlone Blast module but I face some problems with that even. I could not use blast all since the large sequence is just a single sequence unlike the small sequence. where as in the case Blast"seq is just for aligning two sequences to each other. Is there any other way for doing this? Any seggestions? thanks in advance.

Replies are listed 'Best First'.
Re: Aligning sequence
by planetscape (Chancellor) on Feb 21, 2008 at 17:13 UTC
Re: Aligning sequence
by igelkott (Priest) on Feb 21, 2008 at 17:12 UTC
    In Blast, increase the Expect value (10000) and decrease Word size (half the query length).
      hello all, here I have attached my code and the result am getting with the case of Blast
      use strict; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; my %srna; my %genome; my $seqio_srna = Bio::SeqIO->new(- file=>'/home/jayakuma/script/graphi +cs/JH_TCV_lateSL.clones.filtered.23_1_2008.fa', -format=>'fasta') or die "cannot open file\n"; my $seqio_large = Bio::SeqIO->new(-file => '/home/jayakuma/script/grap +hics/TCV-jagger.fna', -format =>'fasta') or die " cannot opne the file\n"; while (my $large_seq = $seqio_large->next_seq()){ my $id = $large_seq->display_id; my $seq = $large_seq->seq; $genome{$id} = $seq; while (my $seq_srna = $seqio_srna->next_seq()){ my $display_id = $seq_srna->display_id; my $seq = $seq_srna->seq; $srna{$display_id} = $seq; #rint "seq: $seq\n"; } foreach my $seq_id (keys %srna){ my $srnas = $srna{$seq_id}; print "srnas: $srnas\n"; foreach($srnas){ my $blast = Bio::Tools::Run::StandAloneBlast->new(program=>'blastn +',database=>'/home/jayakuma/script/graphics/TCV-jagger.fna'); my $input = Bio::Seq->new(-seq=> $srnas); print $input->seq, "\n"; my $blast_report = $blast->blastall($input); while (my $result = $blast_report->next_result()){ my $query_length = $result->query_length(); while (my $hit = $result->next_hit()){ my $id = $hit->accession(); while (my $hsp = $hit->next_hsp()){ if($hsp->frac_identical ==1 && $hsp ->length ==$query_leng +th){ print "$srnas\t$id\n"; } } } } } } }
      The error that am getting with this case is here;
      srnas: TTTGCAGTATTGGACAAGCC TTTGCAGTATTGGACAAGCC Use of uninitialized value in pattern match (m//) at /usr/share/perl5/ +Bio/SeqIO/fasta.pm line 193, <GEN0> line 81643. Use of uninitialized value in print at /usr/share/perl5/Bio/Root/IO.pm + line 407, <GEN0> line 81643. -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at aligning.pl l +ine 34, <GEN0> line 81643.
      Its dying at blastall and thus obviously couldn't get any results.
      I have two files one the large file which contails 4KB(size of sequences)
      And another File that contains set of small sequences (which may be around 4000 in number)
      the large sequence's path is given as the database. Any help please, thanks in advance
        May just be a typo but on line 6, "- file" should be "-file".

        From the first error message, I'd guess that this was an input format error. Little things like spaces in the fasta ID could kill the parser. Check your input around line 81643. If this script works on a small portion of the input, try with a region around the error.

        The second error message could just be an artifact because it really should know where "blastall" is kept. Just to make sure, check $blast->executable('blastall').

Re: Aligning sequence
by kyle (Abbot) on Feb 21, 2008 at 16:43 UTC

    Can you tell us more about the problem? How is a sequence stored (a string?)? What qualifies as alignment?

      for example let we have the large sequence as
      ATGCGGGCCC
      and the set of small sequence be
      ATG
      ACG
      GGG
      ATCCGGGCCG
      then
      ATGCGGGCC
      ATG------
      ATGCGGCC
      A-G-----
      abd so on....
      hope this will help to know waht the alignment is,