in reply to Re^2: Scrape a blog: a statistical approach
in thread Scrape a blog: a statistical approach

You could take a diff between consecutive pages instead of counting lines. You'd have to experiment with different modules like e.g. HTML::Diff or Text::Diff, but this approach could also help with style/layout changes.
  • Comment on Re^3: Scrape a blog: a statistical approach