in reply to small search script
Instead of indexing the HTML, I'd personally index a text version obtained with tools such as HTML::FormatText.