I have a different idea, not fully tested and elaborated yet. For each paragraph, hash all the keywords contained in it. Then sort the paragraphs according to the number of keywords they contain. For each paragraph starting from the "richest" one, check whether all its keywords are present in the remaining paragraphs. If yes, remove the paragraph.