in reply to Stripping tags from a PerlMonks page.

Here's a quick and dirty solution I came up with. Ideally, you should get with vroom about getting a "basic info, no fancy stuff" data feed, if there isn't something already (similar to the slashdot style info boxes). Anyway, here it is:
{ open(PAGE, "$PageToParse.html") or die "Could not open: $!\n"; local $/; while (<PAGE>) { m#<TITLE>(.*)</TITLE># and $title=$1; s#^.*?</TABLE>##s; s#<!-- nodelets start.*##s; print $_; ## Or to a new file, etc. } }

Short, ugly, and to the point. The Title is the only thing of value I can see keeping up until the end of the first TABLE tag. Jettison all that, jettison everything after the nodelets, and add back in stuff like the title, <BODY>, </BODY>, etc. as you desire.