Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

regular expressions

by Becky (Beadle)
on Jun 13, 2002 at 10:39 UTC ( [id://174138]=perlquestion: print w/replies, xml ) Need Help??

Becky has asked for the wisdom of the Perl Monks concerning the following question:

How can I search through a file (of short peptide sequences if you're interested) and either delete all entries that match the patterns .WKP or .MRP, or write all entries except these ones to a new file?

Replies are listed 'Best First'.
Re: regular expressions
by Abigail-II (Bishop) on Jun 13, 2002 at 10:51 UTC
    perl -nwi -e 'print unless /\.(?:WKP|MRP)/' your_file

    Abigail

Re: regular expressions
by stajich (Chaplain) on Jun 13, 2002 at 14:02 UTC
    I'm guessing the peptides might be in a format which spans more than one line? So you would want to skip whole entry not just the lines as Abigail-II suggests. You can do this without external modules, but removing the newlines but if you are doing much work with biological sequences I'd humbly suggest bioperl, BioPerl, project website. If your sequences happen to be in swissprot format instead of fasta you just have to change the 'fasta' below to 'swiss'. Nifty, eh.
    use Bio::SeqIO; my $input = new Bio::SeqIO(-format => 'fasta', -file => 'inputfile'); my $output = new Bio::SeqIO(-format => 'fasta', -file => '>outputfile'); while( my $seq = $input->next_seq ) { # skip the whole entry if the sequence constains the # pattern using Abigail-II's suggestion next if ( $seq->seq() =~ /\.(?:WKP|MRP)/); # otherwise write the sequence out to the output file $out->write_seq($seq); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://174138]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-24 02:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found