I think it would be prudent for you to refrain from writing mismatch tools etc. unless you REALLY know what you are about. Simply interjecting a * between mismatched nucleotides only works if the sequences are equal length and have no more correct alignment, and has only limited usefulness besides. Are you prepared to write a fully-fledged, correct alignment tool just so you can find mismatches? If you are, you are either a glutton for punishment or absolutely silly, or both, and hopefully brilliant regardless. ;) To align two sequences just use pairwise BLAST or the EMBOSS Smith-Waterman tool. Whatever you do, I recommend that you check out bioperl (http://www.bioperl.org or find it on CPAN in the Miscellaneous->Bio namespace) v1.02 first. You can save loads of development time and be using stronger code that has been tested and improved by lots of people. There is a lot of useful functionality. Granted, some of it is the bleeding edge of development so there are a lot of bugs and it is improved constantly. But the basics of bioinformatics, like converting between formats (EMBL to FASTA for example) and BLASTing (and more importantly, parsing BLAST output), and calling/parsing EMBOSS tools have all been around for a while and so are (dare I say it) trustworthy in bioperl. The code is also readable and easy to program. To convert from raw to FASTA, for example:
use Bio::SeqIO;
my $instream = Bio::SeqIO->new( -file => 'your_in_file',
-format => 'raw' );
my $outstream = Bio::SeqIO->new( -file => '>>your_out_file',
-format => 'Fasta' )';
while ( my $sequence = $instream->next_seq() ) {
$outstream->write_seq( $sequence );
}
Take it from someone who has been down this road: it's hard to worm your way into understanding how to use bioperl's many interlocking modules but once you define the (I would bet) small subset that you need, which is the hard part, any time spent is time saved in the long run. Don't reinvent the wheel.
Good luck. :)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.