Re: look for substrings and getting their location

perldoc -q substring
perldoc -f pos
perldoc -f index

update: example

use strict;
use warnings;

my $YBL027W = 'GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAU
+AUGUUAUGA';

my $seq = shift || 'GUAUG';
my @pos;
while( $YBL027W =~ m/\Q$seq\E/gis ){
    push @pos, pos($YBL027W) - length( $seq );
}


print regex => $/;
printf '       \%d0', $_ for 1 .. 8;
print $/, ( 0 .. 9 ) x 8, $/;
print $YBL027W,$/;


my $req = ' ' x length $YBL027W;

substr($req, $_, 1, '^')
    for @pos;
print  $req, $/;
print "@pos $/";

@pos = ();

for(
    my $lindex = index( $YBL027W, $seq);
    $lindex != -1;
    $lindex = index( $YBL027W, $seq, $lindex + length $seq)
        # + length $seq   so it matches the m//atch solution
        # otherwise UUU in UUUU would match twice ( [UUU]U and U[UUU] 
+)
) {
    push @pos, $lindex;
}


print $/, index => $/;
printf '       \%d0', $_ for 1 .. 8;
print $/, ( 0 .. 9 ) x 8, $/;
print $YBL027W,$/;

$req = ' ' x length $YBL027W;

substr($req, $_, 1, '^')
    for @pos;
print  $req, $/;
print "@pos $/";

__END__
loose$ perl substring.pl
regex
       \10       \20       \30       \40       \50       \60       \70
+       \80
0123456789012345678901234567890123456789012345678901234567890123456789
+0123456789
GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA
^
0

index
       \10       \20       \30       \40       \50       \60       \70
+       \80
0123456789012345678901234567890123456789012345678901234567890123456789
+0123456789
GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA
^
0

loose$ perl substring.pl  UUUAA
regex
       \10       \20       \30       \40       \50       \60       \70
+       \80
0123456789012345678901234567890123456789012345678901234567890123456789
+0123456789
GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA
     ^                                          ^
5 48

index
       \10       \20       \30       \40       \50       \60       \70
+       \80
0123456789012345678901234567890123456789012345678901234567890123456789
+0123456789
GUAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA
     ^                                          ^
5 48

loose$
[download]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

Comment on Re: look for substrings and getting their location Download Code

Replies are listed 'Best First'.
Re: Re: look for substrings and getting their location by biosysadmin (Deacon) on May 09, 2004 at 17:47 UTC
Excellent explanation of three functions that anyone doing biological research in Perl should know. :) You should also check out using the BioPerl modules for doing your sequence input and output, it will make your program general enough to work with many different sequence formats, not just FASTA. Here's a quick example: `use Bio::SeqIO; my $filename = 'test.seq'; my $format = 'fasta'; my $seqio = Bio::SeqIO->new( -file => $filename, -format => $format ); while ( my $seqobj = $seqio->next_seq() ) { my $raw_sequence = $seqobj->seq; # do your searching on this raw sequence }` [download] Hope this helps. :)	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: look for substrings and getting their location
by biosysadmin (Deacon) on May 09, 2004 at 17:47 UTC

You should also check out using the BioPerl modules for doing your sequence input and output, it will make your program general enough to work with many different sequence formats, not just FASTA. Here's a quick example:

use Bio::SeqIO;

my $filename = 'test.seq';
my $format = 'fasta';

my $seqio = Bio::SeqIO->new( -file => $filename, -format => $format );
while ( my $seqobj = $seqio->next_seq() ) {
   my $raw_sequence = $seqobj->seq;
   # do your searching on this raw sequence
}
[download]

Hope this helps. :)

[reply]
[d/l]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.