in reply to How do you scan HTML?

If you can live with the memory footprint stick with it. I would personally use HTML::TokeParser::Simple (or HTML::Parser if it was called for). It too has an XML counterpart (XML::TokeParser). I do not care to provide an example (there are tons already --> super search -- and lots of them are in "how do you scan html" type threads).

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.