in reply to look for substrings and getting their location

perldoc -q substring
perldoc -f pos
perldoc -f index

update: example

use strict; use warnings; my $YBL027W = 'GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAU +AUGUUAUGA'; my $seq = shift || 'GUAUG'; my @pos; while( $YBL027W =~ m/\Q$seq\E/gis ){ push @pos, pos($YBL027W) - length( $seq ); } print regex => $/; printf ' \%d0', $_ for 1 .. 8; print $/, ( 0 .. 9 ) x 8, $/; print $YBL027W,$/; my $req = ' ' x length $YBL027W; substr($req, $_, 1, '^') for @pos; print $req, $/; print "@pos $/"; @pos = (); for( my $lindex = index( $YBL027W, $seq); $lindex != -1; $lindex = index( $YBL027W, $seq, $lindex + length $seq) # + length $seq so it matches the m//atch solution # otherwise UUU in UUUU would match twice ( [UUU]U and U[UUU] +) ) { push @pos, $lindex; } print $/, index => $/; printf ' \%d0', $_ for 1 .. 8; print $/, ( 0 .. 9 ) x 8, $/; print $YBL027W,$/; $req = ' ' x length $YBL027W; substr($req, $_, 1, '^') for @pos; print $req, $/; print "@pos $/"; __END__ loose$ perl substring.pl regex \10 \20 \30 \40 \50 \60 \70 + \80 0123456789012345678901234567890123456789012345678901234567890123456789 +0123456789 GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA ^ 0 index \10 \20 \30 \40 \50 \60 \70 + \80 0123456789012345678901234567890123456789012345678901234567890123456789 +0123456789 GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA ^ 0 loose$ perl substring.pl UUUAA regex \10 \20 \30 \40 \50 \60 \70 + \80 0123456789012345678901234567890123456789012345678901234567890123456789 +0123456789 GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA ^ ^ 5 48 index \10 \20 \30 \40 \50 \60 \70 + \80 0123456789012345678901234567890123456789012345678901234567890123456789 +0123456789 GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA ^ ^ 5 48 loose$

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

Replies are listed 'Best First'.
Re: Re: look for substrings and getting their location
by biosysadmin (Deacon) on May 09, 2004 at 17:47 UTC
    Excellent explanation of three functions that anyone doing biological research in Perl should know. :)

    You should also check out using the BioPerl modules for doing your sequence input and output, it will make your program general enough to work with many different sequence formats, not just FASTA. Here's a quick example:

    use Bio::SeqIO; my $filename = 'test.seq'; my $format = 'fasta'; my $seqio = Bio::SeqIO->new( -file => $filename, -format => $format ); while ( my $seqobj = $seqio->next_seq() ) { my $raw_sequence = $seqobj->seq; # do your searching on this raw sequence }

    Hope this helps. :)