BTW, it's probably possible to write an ad hoc text extractor using heuristic rules and regular expressions to get close to what you want without building a proper tree. I'm not sure without trying if HTML::TreeBuilder or such would be really necessary, but my gut feeling is that it could help quite a bit.
In reply to Re^3: Seperating HTML by paragraph, sentence
by mr_mischief
in thread Seperating HTML by paragraph, sentence
by downer
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |