in reply to Read in fasta format
Read Perl and Bioinformatics in a welcoming note to you..
Now, a FastA file is just a string representing a nucleotide (DNA|RNA) sequence or a Protein sequence, Perl has different strong string manipulation approaches, you can access such a file format either directly (for input or output), check File Input and Output in the Tutorials or you can use one of the BioPerl modules to get an object that includes this file and which you can process further.
The best thing in BioPerl is that you don't have to worry about implementing parsing measures if the file format is recognized by the module you would use and that there is support for many formats (ex. Genbank, FastA...etc), you can even convert among these formats or generate a sequence from scratch in a particular format all the way through to very advanced bioinformatics tasks that involve sequence analysis.. The possibilities are just abound..
Here is a quick example to create an input sequence object and do a basic analysis..
on 'command prompt' that is another issue, you can do that using the @ARGV (check perlvar - Perl predefined variables documentation - for 'ARGV') or one of the Getopt modules, so start from basics and build-up... check www.bioPerl.org and their HowTos but before that invest time in learning enough Perl to get you started, check the Reviews section for book reviews and also work on identifying a general learning path.use strict; use warnings; #counting motifs use Bio::SeqIO; my $file = "SeqHisham.txt"; my $in = Bio::SeqIO->new( -format =>'fasta', -file =>$file, ); my $motif_Count=0; my $motif = 'ga'; while(my $seq = $in->next_seq){ #since the sequence would be processed in a certain way conver +t to string my $string = $seq->seq; if($string=~/$motif/i){ #make the regex appropriate... $motif_Count++; } } print "Found $motif_Count hits\n";
Best of luck and have a nice Perl journey
|
|---|