Good idea, but i probably should have explained myself better. I need to filter out HTML as i read in the keyword. Something like this.... get keyword, go back some spaces, filter out the html and other keywords so its plain text, then go ahead of the word a bit and do the same.
Then you should probably use some HTML parser class or other (HTML::Parser comes to mind), strip out all of the markup, and then use my previous suggestion. It should just be a matter of plugging together two modular solutions to solve a third problem that is really just two problems rolled into one.