Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Re3: BioInformatics - polyA tail search

by fletcher_the_dog (Friar)
on Sep 02, 2003 at 18:44 UTC ( [id://288384]=note: print w/replies, xml ) Need Help??


in reply to Re3: BioInformatics - polyA tail search
in thread BioInformatics - polyA tail search

In the definition of a ployA tail it says "as a string of length 10 or greater containing only 'A' or 'N' if you erase all unwanted characters then some 'A's and 'N's that weren't together before might come together. Also note that you probably want to match against [AN]{10,} so that if there are more than 10 A's or N's in a row the match does not fail. Also, MiamiGenome wanted the filenames. This modified version of your code might work a little better:
# if your file extension is not .txt change it to whatever is approria +te while (my $filename=<*.txt>){ # Open the file and slurp the contents to a string. open FILE, $filename || die "Cannot open '$filename' for reading: $! +\n"; my $file = do { $\ = undef; <FILE> }; close FILE; # If a 'polyA' sequence is found print the file name. if ($file =~ /[AN]{10,}/) { print "$filename has a polyA tail sequence\n"; } }

Replies are listed 'Best First'.
Re: Re: Re3: BioInformatics - polyA tail search
by runrig (Abbot) on Sep 02, 2003 at 19:14 UTC
    Also note that you probably want to match against [AN]{10,} so that if there are more than 10 A's or N's in a row the match does not fail.
    If there are more than ten, then {10} will match just fine.
      I wrote a little test script to test if you were right (and you were), so my question is what use is the upper range indicator? I thought it allowed you to limit the number of times that something matched, but apparently it does not.
      #!/usr/bin/perl use strict; my $seq = "ANANNNNANANANANANANANANANANA"; if ($seq=~/[AN]{10,11}?/) { print "I matched\n"; } else { print "I did not match!\n"; } __OUTPUT__ I matched
        A comma in the range is useful if you are matching something after the sequence, or if you are using capturing parenthesis to save the matched sequence. If you only want to match sequences of 10 and not longer, you would need a negative-lookahead (see perlre) after the {10}.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://288384]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-19 20:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found