question 'string'

nicholaspr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: question 'string' by toolic (Bishop) on Mar 02, 2010 at 15:43 UTC
I want to find the positions in seq1 were there is a '-' character index and delete those positions in seq2. substr Is it possible to do it without a for loop?? Possibly, but why do you have this constraint? Read the docs that I pointed to, try it with your own code, then, if you still have problems, post the code you have tried, along with actual and expected output. Also, please do not post general Perl questions in the 'PerlMonks Discussion' section. Read Where should I post X?.	[reply]
Re: question 'string' by almut (Canon) on Mar 02, 2010 at 16:11 UTC
...and delete those positions in seq2 What exactly do you mean by "delete"? Cut out the chars at those positions, or somehow tag them as "invalid" (e.g. by also putting `"-"` in those places), or even something else? I.e., is the result supposed to be `1-in: --TAGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC- 2-in: -ATTGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC- ==================================================== 2-out: TTGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC` [download] or `1-in: --TAGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC- 2-in: -ATTGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC- ==================================================== 2-out: --TTGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC-` [download] I'm not much of a biologist, but keeping the sequences aligned (the latter variant) somehow seems to make more sense... (?)	[reply] [d/l] [select]
Re: question 'string' by AnomalousMonk (Archbishop) on Mar 02, 2010 at 17:49 UTC
As almut wrote, a lot depends on just what you mean by 'delete'; the Devil is in the details. But in general, this sort of thing is generally handled 'without a for loop' by bitwise boolean operations on strings (see examples below). BrowserUk is very good on this topic (as on so much else), and I seem to remember him or her addressing a similar question in the last month or two, but I can't put my finger on the node at the moment; look back through BrowserUk's posts and you should find much of interest on this subject. Examples (these are by no means intended to represent the most efficient approaches to these problems!): >perl -wMstrict -le "my $seq1 = '--TAGAG--T'; my $seq2 = '-ATTGAGATT'; print 'question 1'; (my $mask1 = $seq1) =~ tr{-ATGC}{\x00\xff}; print qq{seq1 '$seq1'}; print qq{seq2 '$seq2'}; $seq2 &= $mask1; $seq2 =~ tr{\x00}{-}; print qq{seq2 '$seq2'}; print 'question 2'; $seq1 = 'G-TATAG'; $seq2 = 'GATCT-G'; print qq{seq1 '$seq1'}; print qq{seq2 '$seq2'}; ( $mask1 = $seq1) =~ tr{-ATGC}{\x00\xff}; (my $mask2 = $seq2) =~ tr{-ATGC}{\x00\xff}; my $dmask = $mask1 & $mask2; $seq1 &= $dmask; $seq2 &= $dmask; my $diff = $seq1 ^ $seq2; $diff =~ tr{\x00-\xff}{=D}; print qq{diff '$diff'}; " question 1 seq1 '--TAGAG--T' seq2 '-ATTGAGATT' seq2 '--TTGAG--T' question 2 seq1 'G-TATAG' seq2 'GATCT-G' diff '===D===' [download] Update: Slightly improved example for Question 1.	[reply] [d/l]
Re: question 'string' by Anonymous Monk on Mar 02, 2010 at 15:45 UTC
`$ perl seq1='--TAGAGATTGCCCGTAGGACGGGAAGGTGTCAACGTTTTACATTTTGAAC-' ^Z Can't modify constant item in scalar assignment at - line 1, at EOF Execution of - aborted due to compilation errors.` [download] I suggest you read perlintro, then RFC: Bioinformatics Tutorial	[reply] [d/l]