The diff thing is error prone on lots of sites because ads are randomized and menus often change, even if by a single link, per page. Ovid made some good points. Another thing I've relied on when doing this kind of thing is that content has entirely different semantics from navigation and junk.
An article will be made of sentences and not just one or two but a dozen or more. Ads and navigation will rarely be complete sentences and never be more than one or two. I had pretty good success with this strategy building a news/story fetcher 3 years ago for sites without RSS. Plain text --> lines --> filter out everything but contiguous blocks of sentences --> choose the largest remaining item.