in reply to Re^3: Module to extract text from HTML
in thread Module to extract text from HTML
If I understood correctly that you are in control of websites and the formatting of their content, perhaps you could add some tags to the content by means of html comments or, better, custom attributes for html tags <p "data-purpose"="description" "data-index"="1">blah blav</p> and then you just reconstruct the text content from html.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Module to extract text from HTML
by Bod (Parson) on Mar 01, 2024 at 15:47 UTC |