in reply to Re^2: Seperating HTML by paragraph, sentence
in thread Seperating HTML by paragraph, sentence
BTW, it's probably possible to write an ad hoc text extractor using heuristic rules and regular expressions to get close to what you want without building a proper tree. I'm not sure without trying if HTML::TreeBuilder or such would be really necessary, but my gut feeling is that it could help quite a bit.
|
|---|