http://qs1969.pair.com?node_id=77540


in reply to (Golf) Fragment Reassembly

I am surprised that nobody pointed out how this is related to Golf: Embedded In Order and (Golf) Ordered Combinations. From there we can define two helper functions:
sub c{@r='';@r=map{$c=$_;map$c.$_,@r}@_ for 1..shift;@r} sub i{($t=pop)=~s/./.*\Q$&/gs;pop=~/$t/s}
which have bodies of 34 and 49 respectively. Plus 14 for the surrounding pieces. So we are at 97 characters. And then it is easy to finish off with
sub assemble { my$n;{for(c($n++,map{split//}@_)){$v=$_;map{i($v,$_)||next}@_;return$_ +}redo} }
whose body has 76 characters for 173 characters. (Note that I added 5 characters to allow it to be called twice without retaining state.)

This is a theoretically correct solution, but be warned that it is not polynomial either in speed or memory requirements. So it isn't a very useful solution.

In fact it raises questions about what a solution is. This will not run on my machine with either of the original data sets. I do not have such a machine to test on, but I do not believe that even if you try to compile Perl on a 64-bit machine with a very large amount of memory that it will succeed. So while the algorithm is fine on paper, it cannot work on the stated data set.

Is a correct algorithm that will not finish on practical machines considered a solution?

My test data is:

print assemble(qw(oa af wf wa));
which cheerfully finds "owaf" as its answer.

Replies are listed 'Best First'.
Re: Re (tilly) 1: (Golf) Fragment Reassembly
by dws (Chancellor) on May 03, 2001 at 05:32 UTC
    Confirming your observation about memory, this runs for about 45 seconds before running out of memory (on a 256Mb box) when run via   print assemble qw(GATTACA ATTACA GATT AAGAT CCC); Good code compression, though.

Re: Re (tilly) 1: (Golf) Fragment Reassembly
by MeowChow (Vicar) on May 03, 2001 at 06:08 UTC
    I had considered explicitly stating that solutions such as yours, which iterate through all possible strings, would be rated in a seperate class. This makes me wonder, however, if there is a class of optimization problems for which iterating brute-force through the entire solution space is faster (algorithmically) than directly computing a solution.

    You are a bit mistaken in choosing Golf: Embedded In Order, however, since that is not the same thing as a substring:

    print assemble(qw(oa af wf wa)); # owaf - a wrong answer # oafwfwa - a right answer
    If you change that into an index, things work out bettter (and with less code):
    sub c{@r='';@r=map{$c=$_;map$c.$_,@r}@_ for 1..shift;@r} sub assemble { my$n;{for(c($n++,map{split//}@_)){$v=$_;map{1+index$v,$_ or next}@_;re +turn$_}redo} } print assemble(qw(oa af fa afa));
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
      Oops, my misreading.

      As for the question, there actually are well-explored areas where the best known algorithms (by various criteria) are found by randomly guessing something with certain characteristics and then testing whether it really was a solution...