Re3: BioInformatics - polyA tail search

in reply to Re: Re: BioInformatics - polyA tail search
in thread BioInformatics - polyA tail search

This is completely untested, but you can use this as a start. Warning: This could be memory intensive for large files!

# Open the file and slurp the contents to a string.
open FILE, "File_To_Read" || die "Cannot open 'File_To_Read' for readi
+ng: $!\n";
my $file = do { $\ = undef; <FILE> };
close FILE;

# Remove all the characters we don't care about.
$file =~ s/[^ANGTC]//g;

# Walk through the string, looking for matches.
while ($file =~ /[AN]{10}/g)
{
    print "$1\n";
}
[download]

You're going to have to add the loop around the files, add any letters you want to be allowed into the substitution, etc. You're also going to have to add handling if you don't want to see overlapping sequences. Good luck!

------
We are the carpenters and bricklayers of the Information Age.

The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

In Section Seekers of Perl Wisdom