in reply to Re: trivial wrapping
in thread trivial wrapping

thanks for this non-destructive method! ;) I am now trying to get more complicated and have written a sub-routine to calculate the mis-matches between the two sequences, ideally i want it to print out like this;
AGACACTACTGCTG ***** *** ACTACTTACCATCG
If you see what i mean. I tried to plug the $mismatches into the code you gave me but its not working. it thinks $mismatches is another string!
my $mismatches = get_mismatches ($genome1, $genome2); my @strings = ($genome1, $mismatches, $genome2); for (my $x = 0; $x < length ($strings[0]; $x +=60) { print join ("\n", map (substr ($strings[$_], $x, 60), 0 .. 2)). " +\n\n"; }
The subroutine get_mismatches works perfectly well. Can you see why its going wrong? e.g the above code prints this;
TGTATCTACTGACTGAC TGTATCTACTGACTGAC TGTATCTACTGACTGAC ATCTG ATCTG ATCTG

Replies are listed 'Best First'.
Re: Re: Re: trivial wrapping
by clairudjinn (Beadle) on Nov 18, 2002 at 16:53 UTC
    I think it would be prudent for you to refrain from writing mismatch tools etc. unless you REALLY know what you are about. Simply interjecting a * between mismatched nucleotides only works if the sequences are equal length and have no more correct alignment, and has only limited usefulness besides. Are you prepared to write a fully-fledged, correct alignment tool just so you can find mismatches? If you are, you are either a glutton for punishment or absolutely silly, or both, and hopefully brilliant regardless. ;) To align two sequences just use pairwise BLAST or the EMBOSS Smith-Waterman tool. Whatever you do, I recommend that you check out bioperl (http://www.bioperl.org or find it on CPAN in the Miscellaneous->Bio namespace) v1.02 first. You can save loads of development time and be using stronger code that has been tested and improved by lots of people. There is a lot of useful functionality. Granted, some of it is the bleeding edge of development so there are a lot of bugs and it is improved constantly. But the basics of bioinformatics, like converting between formats (EMBL to FASTA for example) and BLASTing (and more importantly, parsing BLAST output), and calling/parsing EMBOSS tools have all been around for a while and so are (dare I say it) trustworthy in bioperl. The code is also readable and easy to program. To convert from raw to FASTA, for example:
    use Bio::SeqIO; my $instream = Bio::SeqIO->new( -file => 'your_in_file', -format => 'raw' ); my $outstream = Bio::SeqIO->new( -file => '>>your_out_file', -format => 'Fasta' )'; while ( my $sequence = $instream->next_seq() ) { $outstream->write_seq( $sequence ); }
    Take it from someone who has been down this road: it's hard to worm your way into understanding how to use bioperl's many interlocking modules but once you define the (I would bet) small subset that you need, which is the hard part, any time spent is time saved in the long run. Don't reinvent the wheel. Good luck. :)