I know there are a lot of ways to parse HTML.
However my goal is to create a list of meaningfull words for creating a document based on web page content.
I do not want to reinvent a bike. So your suggestions would be appreciated. I am sure quite a few people did this already.