in reply to formatting output question (use of recursive subroutine?)

rogerd,
If there is no need to put more than one match on a line, the solution would look more like
my $dna = 'gctagctgatgctagcagcagcatgtagctagctgacga'; for my $strand (@fragment) { my $pos = index($dna, $strand); next if $pos == -1; my $pad = ' ' x $pos; print $pad, $strand, "\n"; }
If you really need to put more than one match per line and don't care about efficiency (using the fewest number of total lines), you can use a bit mask to determine if "this" strand needs to go on to the next line. If you need help with that, speak up.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: formatting output question (use of recursive subroutine?)
by rogerd (Sexton) on Jul 29, 2008 at 22:29 UTC

    I need to to put more than one match per line, using the fewest number of total lines. The preference should be to put the match in the first line if there is no overlap, then the second, etc.

    I will read something about bit mask, I don't know what is it.

      rogerd,
      If you need to put more than one match on a line using the fewest number of total lines, you have a variation on Bin Packing Problem which is NP hard. I can write you a solution, but I don't have time at the moment (you can search this site and CPAN for examples that you could adapt).

      Update: Following paragraph added
      The bit mask isn't going to help by itself. It was just an idea for determining overlap. Because you require the matches on as fewest lines as possible, you have to try all possible combinations. This is why the problem is NP hard. I am assuming that for a given DNA sequence, the number of strands to place is quite large. If this assumption is correct, your perfect solution is going to be slow so you might want to look at a heuristic approach instead.

      Cheers - L~R

        Thank you! I read about the "Bin packing problem", and some solutions to it. For what I need, an heuristic approach will be perfectly enough.

        But first, I have to solve how to determine if the substring "fits in the can". The idea of "fitting" is that if there are whitespaces in the output line at the positions where each letter of my substring takes place, it "fits". I thought of spliting the string into an array, in a way that each letter is now an element of the array, and then to check if the elements of the array in the positions of my substring are whitespaces or not, but I am not sure if it is a good alternative.