Hi, I have a problem formatting some output. I have a reference DNA sequence, that is a string of letters, for example:
atgtagctagctagctaacgagcgctagctagctagtgatgactgatThen, I have several substrings that match against that reference sequence, and what I want is print the aligned sequences, for example:
ref atgtagctagctagctaacgagcgctagctagctagtgatg substr agctagctagctaac
I have already stored the position where the substring starts matching, and the length of the substring, so, if it where just one substring, I could do something like this:
$reference_string = 'gctagctgatgctagcagcagcatgtagctagctgacga' $substring = 'aatgctagctagc' $output_line = qw{ } x length($reference_string); substr $output_line, $start_position, $length, $substring; print $reference_string, "\n", $output_line;
The problem is that I have many substrings, that sometimes overlap between eachother. The resulting output should look something like this:
ref agctagctagctagcatgctagctagctgatcgatgctagctagctgactgacgacg out1 atctagcat agctagcgatcga gactgacagc out2 tagctagctgctagc out3 agtcgatcgatgctagc
So I thougth the following rules (something like pseudocode):
create one blank line of output foreach substring take the first blank line if there is no overlap substr the blank line if there is overlap create a new blank line substr the new blank line
But then, I can find an overlap in that second blank line too, so I would have to repeat the process, checking if there is an overlap in the second blank line, etc. I thought of writing a recursive subroutine, checking each time if there is an overlap in the first blank line, and continuing deeper until it finds there is no overlap, or it creates a new one.
Do you think is it a good strategy? Can I make it in a more clear way? I found this way somewhat cumbersome, and I couldn't manage to solve it already. Thank you very much in advance for your help
Roger
In reply to formatting output question (use of recursive subroutine?) by rogerd
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |