The brute force method mentioned above mentioned "for each combination of paragraphs".
You mention 600000 paragraphs.
The thought of 2^600000 iterations would definitely have me looking at the Spanning Tree solution. | [reply] |
would definitely have me looking at the Spanning Tree solution.
Code it!
If you succeed in creating a undirected graph to represent those 600,000 nodes and the 600,000^2 weighted edges that connect them, using any of the Graph::* modules, please inform us.
Before you start running whichever algorithm you opt for, because, I'd be interested to know which module you choose, but as I likely have less than a century left on this world, I doubt I'll be around when it completes.
On the other hand, a projected 1 1/2 hours brute-force attempt is worth the effort of exploring.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
There are about 600000 paragraphs and about 200 words
One hash contains the words and the paragraphs occurring as the values
And the other hash for each paragraphs the words contained in them.
| [reply] |