3,0.001355
4,0.002366
5,0.003744
6,0.005187
7,0.006808
8,0.008807
9,0.011459
10,0.014172
11,0.017328
12,0.020473
13,0.023491
14,0.027905
15,0.031813
16,0.036426
17,0.043177
18,0.067228
19,0.052098
20,0.059449
30,0.132611
40,0.235241
50,0.367828
60,0.510851
70,0.695534
80,0.913879
90,1.178992
100,1.427899
125,2.254988
150,3.269179
200,5.831225
These results are from a second copy of your program that I created. It appears my first copy still has a bug.
Yes, there must be a bug in my original copy. Running my full data set with my backup copy of your program gave a runtime of 14.759568. I'd say that's a little better than three years! I'll check the results to see how they compare with some of my old results from limited data sets. That check could take a little time :-)
The problem could be with the way PerlIDE runs scripts. All the times above are from the command line (Win2000). I ran the full data set with the second copy of the program from PerlIDE and got a runtime of 15.147881. So the problem is with my first copy of your program.
I really like your algorithm. It's speed will allow me to modify my original data and test a couple of theories. You've got a real winner.
update
I checked the results from the program and the algorithm isn't finding all the matches. I made a modification which seems to do better.
I added a new variable called $filterSize. This is used to exclude matches below a minimum length for output. It replaces $subStrSize. $subStrSize is now used to set the minimum string search size. The statement used to output the results is also modified.
my $subStrSize = 128;
my $filterSize = 256;
.
.
.
$localBest[4] > $filterSize
? printf OUT "%10s::%-10s Length[%4d] Starts(%4d %4d)\n",
$strings[$localBest[0]][0], $strings[$localBest[1]][0], $loc
+alBest[4], $localBest[2], $localBest[3]
: print OUT " Didn't find LCS for $sourceName and $targetName\n
+";
The last statement replaces:
@localBest
? printf OUT "%10s::%-10s Length[%4d] Starts(%4d %4d)\n",
$strings[$localBest[0]][0], $strings[$localBest[1]][0], $loc
+alBest[4], $localBest[2], $localBest[3]
: print OUT " Didn't find LCS for $sourceName and $targetName\n
+";
I hoped this would give me better results. I used $filterSize = 256. I set $subStrSize to 16, 32, 64, or 128. The runtimes for $subStrSize were 250.505445, 124.928021, 57.274645, and 32.787612, respectively. I got the best results with $subStrSize = 128, $filterSize = 256. I don't know why $subStrSize = 128 gave better results than shorter lengths. Also $subStrSize = 256 gave significantly poorer results than any of the shorter lengths. |