Re (tilly) 1: (Golf) Fragment Reassembly

in reply to (Golf) Fragment Reassembly

I am surprised that nobody pointed out how this is related to Golf: Embedded In Order and (Golf) Ordered Combinations. From there we can define two helper functions:

sub c{@r='';@r=map{$c=$_;map$c.$_,@r}@_ for 1..shift;@r}

sub i{($t=pop)=~s/./.*\Q$&/gs;pop=~/$t/s}
[download]

which have bodies of 34 and 49 respectively. Plus 14 for the surrounding pieces. So we are at 97 characters. And then it is easy to finish off with

sub assemble {
my$n;{for(c($n++,map{split//}@_)){$v=$_;map{i($v,$_)||next}@_;return$_
+}redo}
}
[download]

whose body has 76 characters for 173 characters. (Note that I added 5 characters to allow it to be called twice without retaining state.)

This is a theoretically correct solution, but be warned that it is not polynomial either in speed or memory requirements. So it isn't a very useful solution.

In fact it raises questions about what a solution is. This will not run on my machine with either of the original data sets. I do not have such a machine to test on, but I do not believe that even if you try to compile Perl on a 64-bit machine with a very large amount of memory that it will succeed. So while the algorithm is fine on paper, it cannot work on the stated data set.

Is a correct algorithm that will not finish on practical machines considered a solution?

My test data is:

print assemble(qw(oa af wf wa));
[download]

which cheerfully finds "owaf" as its answer.

Comment on Re (tilly) 1: (Golf) Fragment Reassembly Select or Download Code

Replies are listed 'Best First'.
Re: Re (tilly) 1: (Golf) Fragment Reassembly by dws (Chancellor) on May 03, 2001 at 05:32 UTC
Confirming your observation about memory, this runs for about 45 seconds before running out of memory (on a 256Mb box) when run via `print assemble qw(GATTACA ATTACA GATT AAGAT CCC);` Good code compression, though.	[reply] [d/l]
Re: Re (tilly) 1: (Golf) Fragment Reassembly by MeowChow (Vicar) on May 03, 2001 at 06:08 UTC
I had considered explicitly stating that solutions such as yours, which iterate through all possible strings, would be rated in a seperate class. This makes me wonder, however, if there is a class of optimization problems for which iterating brute-force through the entire solution space is faster (algorithmically) than directly computing a solution. You are a bit mistaken in choosing Golf: Embedded In Order, however, since that is not the same thing as a substring: `print assemble(qw(oa af wf wa)); # owaf - a wrong answer # oafwfwa - a right answer` [download] If you change that into an `index`, things work out bettter (and with less code): `sub c{@r='';@r=map{$c=$_;map$c.$_,@r}@_ for 1..shift;@r} sub assemble { my$n;{for(c($n++,map{split//}@_)){$v=$_;map{1+index$v,$_ or next}@_;re +turn$_}redo} } print assemble(qw(oa af fa afa));` [download] MeowChow s aamecha.s a..a\u$&owag.print	[reply] [d/l] [select]
Re(tilly) 3: (Golf) Fragment Reassembly by tilly (Archbishop) on May 03, 2001 at 06:14 UTC
Oops, my misreading. As for the question, there actually are well-explored areas where the best known algorithms (by various criteria) are found by randomly guessing something with certain characteristics and then testing whether it really was a solution...	[reply]

In Section Meditations