It's supposed to be for an assignment and we must use REGEXPS...
Sequence as you can see is spread over multiple lines, that's why I tried to catch everything from </code>/translation</code> all the way until first occurence of exon in the file.... | [reply] [d/l] |
It's supposed to be for an assignment and we must use REGEXPS...
That's akin to being asked to do a gainer off a diving board when just learning to swim. Especially so if you're in bioinformatics. From my experience, it would be more pedagogically sound to first learn to proficiently wield the (BioPerl) tools, then learn how to forge such tools...
If you must, however, use a regex in your script, perhaps the following will be helpful:
use strict;
use warnings;
use Bio::SeqIO;
my $filename = 'sequences.gen';
my $stream = Bio::SeqIO->new(
-file => $filename,
-format => 'GenBank'
);
while ( my $seq = $stream->next_seq() ) {
my $trans = $seq->translate();
print $trans->seq(), "\n";
}
my $string = 'This script uses a regex.';
$string =~ s/uses/doesn't use/;
print $string;
| [reply] [d/l] |
Nice, but that doesn't work (because the text used by the OP does not constitute a valid genbank record).
That could be worked around by getting the complete record, I guess. But the wrath of the teacher needs to be deflected too. Perhaps make the regex a (quoted) multiline capture? :)
| [reply] |
Still, this doesn't seem to work...
if($line7=~/^\s+\/translation\=\"(.*?)\"/s)
{$amino_acid_seq=$1;}
| [reply] [d/l] |
Try m//ms instead if //gs
Also, use re 'debug'; to see how the regex engine matches your string ... you can also use rxrx - command-line REPL and wrapper for Regexp::Debugger
Also interesting (but tad more pita to install) is wxPPIxregexplain.pl
| [reply] [d/l] |
Also, once you get more than one line into $line7, you want non-greedy matching .*? as there are multiple "exon" strings
also, you don't want to use m//g in scalar context
Also, perlrequick is a great quick reference :)
| [reply] [d/l] |