You could take a diff between consecutive pages instead of counting lines. You'd have to experiment with different modules like e.g. HTML::Diff or Text::Diff, but this approach could also help with style/layout changes.
Comment on Re^3: Scrape a blog: a statistical approach