If I understood correctly that you are in control of websites and the formatting of their content, perhaps you could add some tags to the content by means of html comments or, better, custom attributes for html tags <p "data-purpose"="description" "data-index"="1">blah blav</p> and then you just reconstruct the text content from html.
In reply to Re^4: Module to extract text from HTML
by bliako
in thread Module to extract text from HTML
by Bod
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |