Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Help to build a REGEXP (BioPerl)

by Anonymous Monk
on Mar 11, 2014 at 23:43 UTC ( [id://1077937]=note: print w/replies, xml ) Need Help??


in reply to Re: Help to build a REGEXP (BioPerl)
in thread Help to build a REGEXP

It's supposed to be for an assignment and we must use REGEXPS...
Sequence as you can see is spread over multiple lines, that's why I tried to catch everything from </code>/translation</code> all the way until first occurence of exon in the file....

Replies are listed 'Best First'.
Re^3: Help to build a REGEXP (BioPerl)
by Kenosis (Priest) on Mar 11, 2014 at 23:59 UTC

    It's supposed to be for an assignment and we must use REGEXPS...

    That's akin to being asked to do a gainer off a diving board when just learning to swim. Especially so if you're in bioinformatics. From my experience, it would be more pedagogically sound to first learn to proficiently wield the (BioPerl) tools, then learn how to forge such tools...

    If you must, however, use a regex in your script, perhaps the following will be helpful:

    use strict; use warnings; use Bio::SeqIO; my $filename = 'sequences.gen'; my $stream = Bio::SeqIO->new( -file => $filename, -format => 'GenBank' ); while ( my $seq = $stream->next_seq() ) { my $trans = $seq->translate(); print $trans->seq(), "\n"; } my $string = 'This script uses a regex.'; $string =~ s/uses/doesn't use/; print $string;

      Nice, but that doesn't work (because the text used by the OP does not constitute a valid genbank record).

      That could be worked around by getting the complete record, I guess. But the wrath of the teacher needs to be deflected too. Perhaps make the regex a (quoted) multiline capture? :)

        Yes, I said that, easily fixed.

        Still, it might be a good idea to do the obviously intended regexp multiline capture (intended by the teacher), especially as you included a regex line already.

        (oops, replied to myself... ah well, you get the idea)

        Nice, but that doesn't work (because the text used by the OP does not constitute a valid genbank record).

        It's a snip from a valid genbank record, and it parses beautifully when simply pasted into a full record. The OP said, "I have this part of a file that I want to match..." The "part" is the snip provided. The assignment is pedagogically problematic as it is (IMO), but it would be even worse to require the raw parsing of an incomplete genbank record.

Re^3: Help to build a REGEXP (BioPerl)
by Anonymous Monk on Mar 11, 2014 at 23:47 UTC
    Still, this doesn't seem to work...
    if($line7=~/^\s+\/translation\=\"(.*?)\"/s) {$amino_acid_seq=$1;}

      Try m//ms instead if //gs

      Also, use re 'debug'; to see how the regex engine matches your string ... you can also use rxrx - command-line REPL and wrapper for Regexp::Debugger

      Also interesting (but tad more pita to install) is wxPPIxregexplain.pl

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1077937]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-29 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found