And here is the long-winded road of using the mech to save to PDF and then use pdftotext
I'm still waiting for someone to suggest printing out, scanning back in, doing OCR, and have an AI fix the OCR errors. ;-)
Also, no traces of "just use a regex" so far. Which is really good.
Alexander
In reply to Re^3: Module to extract text from HTML
by afoken
in thread Module to extract text from HTML
by Bod
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |