in reply to Recommendation on a module for HTML/XML extraction.

Decent overview here ...

The way forward always starts with a minimal test.