in reply to Re: formatting output question (use of recursive subroutine?)
in thread formatting output question (use of recursive subroutine?)

I need to to put more than one match per line, using the fewest number of total lines. The preference should be to put the match in the first line if there is no overlap, then the second, etc.

I will read something about bit mask, I don't know what is it.

  • Comment on Re^2: formatting output question (use of recursive subroutine?)

Replies are listed 'Best First'.
Re^3: formatting output question (use of recursive subroutine?)
by Limbic~Region (Chancellor) on Jul 30, 2008 at 00:05 UTC
    rogerd,
    If you need to put more than one match on a line using the fewest number of total lines, you have a variation on Bin Packing Problem which is NP hard. I can write you a solution, but I don't have time at the moment (you can search this site and CPAN for examples that you could adapt).

    Update: Following paragraph added
    The bit mask isn't going to help by itself. It was just an idea for determining overlap. Because you require the matches on as fewest lines as possible, you have to try all possible combinations. This is why the problem is NP hard. I am assuming that for a given DNA sequence, the number of strands to place is quite large. If this assumption is correct, your perfect solution is going to be slow so you might want to look at a heuristic approach instead.

    Cheers - L~R

      Thank you! I read about the "Bin packing problem", and some solutions to it. For what I need, an heuristic approach will be perfectly enough.

      But first, I have to solve how to determine if the substring "fits in the can". The idea of "fitting" is that if there are whitespaces in the output line at the positions where each letter of my substring takes place, it "fits". I thought of spliting the string into an array, in a way that each letter is now an element of the array, and then to check if the elements of the array in the positions of my substring are whitespaces or not, but I am not sure if it is a good alternative.