in reply to Re^2: Help required in optimizing the output (PERL)
in thread Help required in optimizing the output (PERL)

This may not completely work.. As there are about 600000 paragraphs and trying to eliminate it that way may not be very efficient. ( I havent tested it) And about 200 keywords. ( these are the maximum count we are considering ) Will look if brute force or spanning tree can be used. Thanks all for the help and also i understand the homework part however my comment was posted only for the part where somebody without knowing or assuming made some unnecessary comments.
  • Comment on Re^3: Help required in optimizing the output (PERL)

Replies are listed 'Best First'.
Re^4: Help required in optimizing the output (PERL)
by Anonymous Monk on Jun 22, 2010 at 20:16 UTC
    The brute force method mentioned above mentioned "for each combination of paragraphs".

    You mention 600000 paragraphs.

    The thought of 2^600000 iterations would definitely have me looking at the Spanning Tree solution.

      would definitely have me looking at the Spanning Tree solution.

      Code it!

      If you succeed in creating a undirected graph to represent those 600,000 nodes and the 600,000^2 weighted edges that connect them, using any of the Graph::* modules, please inform us.

      Before you start running whichever algorithm you opt for, because, I'd be interested to know which module you choose, but as I likely have less than a century left on this world, I doubt I'll be around when it completes.

      On the other hand, a projected 1 1/2 hours brute-force attempt is worth the effort of exploring.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Help required in optimizing the output (PERL)
by randomid (Initiate) on Jun 22, 2010 at 16:32 UTC
    There are about 600000 paragraphs and about 200 words One hash contains the words and the paragraphs occurring as the values And the other hash for each paragraphs the words contained in them.