Hi, I have a problem formatting some output. I have a reference DNA sequence, that is a string of letters, for example:

atgtagctagctagctaacgagcgctagctagctagtgatgactgat

Then, I have several substrings that match against that reference sequence, and what I want is print the aligned sequences, for example:

ref atgtagctagctagctaacgagcgctagctagctagtgatg substr agctagctagctaac

I have already stored the position where the substring starts matching, and the length of the substring, so, if it where just one substring, I could do something like this:

$reference_string = 'gctagctgatgctagcagcagcatgtagctagctgacga' $substring = 'aatgctagctagc' $output_line = qw{ } x length($reference_string); substr $output_line, $start_position, $length, $substring; print $reference_string, "\n", $output_line;

The problem is that I have many substrings, that sometimes overlap between eachother. The resulting output should look something like this:

ref agctagctagctagcatgctagctagctgatcgatgctagctagctgactgacgacg out1 atctagcat agctagcgatcga gactgacagc out2 tagctagctgctagc out3 agtcgatcgatgctagc

So I thougth the following rules (something like pseudocode):

create one blank line of output foreach substring take the first blank line if there is no overlap substr the blank line if there is overlap create a new blank line substr the new blank line

But then, I can find an overlap in that second blank line too, so I would have to repeat the process, checking if there is an overlap in the second blank line, etc. I thought of writing a recursive subroutine, checking each time if there is an overlap in the first blank line, and continuing deeper until it finds there is no overlap, or it creates a new one.

Do you think is it a good strategy? Can I make it in a more clear way? I found this way somewhat cumbersome, and I couldn't manage to solve it already. Thank you very much in advance for your help

Roger


In reply to formatting output question (use of recursive subroutine?) by rogerd

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.