in reply to Re: homemade blat: need help with reverse string.
in thread homemade blat: need help with reverse string.

Hello, Thank you for your kind response! I worked on it and this program does everything i need
use strict; my @bases; my @list; my @List; my %oligo = (); my %pos = (); my $base; my $list; my $f; my $nomefile; my $line; my $piece; my $query; my $len; my $sequenza; my $i; my $j; my $rev; my $revcomp; my $oligo; my $length = length($sequenza); my $piececomp; print("Insert query:\n"); $query = <STDIN>; chomp($query); $len = length($query); $query = uc $query; @bases = qw(A G C T); #bases @list = qw(A G C T); #seed for ($i = 0; $i < ($len - 1); $i++) { foreach $base (@bases) { foreach $list (@list) { $oligo{$base.$list} = 1; } @List = keys %oligo; } @list = keys %oligo; %oligo = (); #print "@list xxx \n"; } %oligo = (); @List = sort @List; $nomefile = $ARGV[0]; open $f, "< $nomefile" or die "cannot open $nomefile: $!"; $line = <$f>; chomp($line); if(substr($line,0,1) ne ">") { print STDERR ("Input file $nomefile not in FASTA format\n"); exit; } $sequenza = <$f>; while($line = <$f>) { chomp($line); if(substr($line, 0, 1) ne ">") { $sequenza = $sequenza.$line; } } $sequenza = uc $sequenza; for($i = 0; $i < length($sequenza) - ($len - 1); $i++) # nell'ultimo f +rame di lettura non deve contare la len intera se no va 'fuori sequen +za' e non trova l'oligo corrispondente { $piece = substr($sequenza, $i, $len); if(exists $oligo{$piece}) { $oligo{$piece} = $oligo{$piece} + 1; push(@{$pos{$piece}} , $i ); } else { $oligo{$piece} = 1; push(@{$pos{$piece}} , $i ); } } foreach $piece (sort keys %oligo) { print("$piece\t$oligo{$piece}\t@{$pos{$piece}}\n\n"); } $rev= reverse $query; $rev=~ tr/ACTG/TGAC/; $revcomp = $rev; for($j = $length; $j < $length - $len; $j--) { $revcomp = substr($sequenza, $j, $len); if(exists $oligo{$revcomp}) { push(@{$pos{$revcomp}} , ($length - $j) - ($len - 1)); } } if(exists $oligo{$query} && !exists $oligo{$revcomp}) { print("$query\t compare $oligo{$query} volte\t in posizione @{$pos +{$query}} solo sul filamento positivo\n"); } else { if(exists $oligo{$query} && exists $oligo{$revcomp}) { print("$query\t compare $oligo{$query} volte\t in posizione @{ +$pos{$query}} sul filamento positivo\n\ne compare $oligo{$revcomp} vo +lte\t in posizione @{$pos{$revcomp}}\t sul filamento negativo\n"); } else { if(!exists $oligo{$query} && exists $oligo{$revcomp}) { print("$query compare $oligo{$revcomp} volte\t in posizion +e @{$pos{$revcomp}}\t sul filamento negativo\n"); } else { print("$query non compare nella sequenza\n"); } } }
It tells me if an oligo appears only in positive strand, in positive and negative and only in negative strand by searching in an hash table i created using the length of the query written from keyboard. I'm searching an oligo on the negative strand by searching its reverse complementary on the positive strand. The sequence i'm working on begins with GATCAC and end with ACGATG. The program works fine with these sequence, but when i search their reverse complementary, the position on the negative strand are wrong (if i search the reverse complementary of the first oligo in the positive strand, it should be in the last position of the negative strand, instead the program tells me that it is in the first one). This is the problem! However you've done a great work, and it should help me with my problem!

Replies are listed 'Best First'.
Re^3: homemade blat: need help with reverse string.
by Athanasius (Cardinal) on Dec 17, 2015 at 03:08 UTC

    Looking at just $length and $len, the code shown has the following content:

    ... my $len; my $sequenza; ... my $j; ... my $length = length($sequenza); ... print "Insert query:\n"; $query = <STDIN>; chomp $query; $len = length($query); ... for($j = $length; $j < $length - $len; $j--) { ... } ...

    As you can see, when $length is initialized, $sequenza is undefined, so $length still has a numerical value of zero (as I pointed out above).

    However, even if $length were to be initialized correctly before the for loop, the loop condition still looks wrong to me. $j is initially assigned the value in $length, and is then decremented on each iteration. The stopping condition is $j < $length - $len, and neither $length nor $len is changed in the loop body. So if the condition is initially true, i.e., if $j is initially less than some fixed value, decrementing $j will never make the condition false (well, not until an integer underflow occurs, anyway), which means you effectively have an infinite loop.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I noticed looking up at the UCSC browser's BLAT that i was mistaking the evaluation of the position on the single strand! Sure you help was well accepted, thank you very much!