For the algorithm, the problem is the implementation
not the sort'n'subtract. With 4 values on top and 5 underneath, there
are numerous ways in which the subtraction needs to be
done.
That's why I simply enumerated each factorial's terms in increasing order and
merged the streams, canceling in passing. It was a heck of a lot
easier than writing a fast, robust representation of intervals, and yet this crude implementation is "fast enough": Less than 17
percent of my code's time is spent on this task (see profile report,
below). Were I to eliminate this time entirely, I would gain only a
one-fifth improvement in overall speed, and that fact suggests that this line of optimization has petered out.
Interesting tidbit: The entire pre-multiplication pipleline accounts for only 6 percent of the overall allocation cost, suggesting that the lazy build-and-merge-and-cancel operation works in near-constant space.
Cheers,
Tom
individual inherited
COST CENTRE entries %time %alloc %time %alloc
MAIN 0 0.0 0.0 100.0 100.0
main 1 0.0 0.0 90.5 82.4
pCutoff 1 0.0 0.0 90.5 82.4
toRatio 1 16.7 0.0 73.8 76.4
bigProduct 2 57.1 76.3 57.1 76.3
rpCutoff 1 0.0 0.0 16.7 6.0 <--- here
rdivide 1 0.0 0.0 0.0 0.0
rtimes 1 0.0 0.0 0.0 0.0
rtimes 0 0.0 0.0 4.8 0.7
rdivide 1 0.0 0.0 4.8 0.7
cancel 113163 4.8 0.7 4.8 0.7
merge 2 0.0 0.0 0.0 0.0
facproduct 2 0.0 0.0 11.9 5.3
fac 9 4.8 2.8 4.8 2.8
rproduct 2 0.0 0.0 7.1 2.5
rtimes 9 0.0 0.0 7.1 2.5
cancel 9 0.0 0.0 0.0 0.0
merge 294870 7.1 2.5 7.1 2.5
CAF 1 0.0 0.0 9.5 17.6
sciformat 3 0.0 0.0 9.5 17.6
tosci 7031 9.5 17.4 9.5 17.6
digits 122 0.0 0.2 0.0 0.2
Note: All pre-multiplication logic, including merging and canceling,
is contained within the inherited time marked "here".