in reply to Help to build a REGEXP

Why don't you use one of the BioPerl modules for reading that aminofastawhateveritis?

Then you don't need to build a regex to parse the mystery $line7 variable which probably only contains one single line so thats all that is returned because the other lines aren't in the variable ...

:)

Replies are listed 'Best First'.
Re^2: Help to build a REGEXP (BioPerl)
by Anonymous Monk on Mar 11, 2014 at 23:43 UTC
    It's supposed to be for an assignment and we must use REGEXPS...
    Sequence as you can see is spread over multiple lines, that's why I tried to catch everything from </code>/translation</code> all the way until first occurence of exon in the file....

      It's supposed to be for an assignment and we must use REGEXPS...

      That's akin to being asked to do a gainer off a diving board when just learning to swim. Especially so if you're in bioinformatics. From my experience, it would be more pedagogically sound to first learn to proficiently wield the (BioPerl) tools, then learn how to forge such tools...

      If you must, however, use a regex in your script, perhaps the following will be helpful:

      use strict; use warnings; use Bio::SeqIO; my $filename = 'sequences.gen'; my $stream = Bio::SeqIO->new( -file => $filename, -format => 'GenBank' ); while ( my $seq = $stream->next_seq() ) { my $trans = $seq->translate(); print $trans->seq(), "\n"; } my $string = 'This script uses a regex.'; $string =~ s/uses/doesn't use/; print $string;

        Nice, but that doesn't work (because the text used by the OP does not constitute a valid genbank record).

        That could be worked around by getting the complete record, I guess. But the wrath of the teacher needs to be deflected too. Perhaps make the regex a (quoted) multiline capture? :)

      Still, this doesn't seem to work...
      if($line7=~/^\s+\/translation\=\"(.*?)\"/s) {$amino_acid_seq=$1;}

        Try m//ms instead if //gs

        Also, use re 'debug'; to see how the regex engine matches your string ... you can also use rxrx - command-line REPL and wrapper for Regexp::Debugger

        Also interesting (but tad more pita to install) is wxPPIxregexplain.pl

Re^2: Help to build a REGEXP
by Anonymous Monk on Mar 11, 2014 at 23:43 UTC

    Also, once you get more than one line into $line7, you want non-greedy matching .*? as there are multiple "exon" strings

    also, you don't want to use m//g in scalar context

    Also, perlrequick is a great quick reference :)