in reply to Re^2: Module to extract text from HTML
in thread Module to extract text from HTML
And here is the long-winded road of using the mech to save to PDF and then use pdftotext
I'm still waiting for someone to suggest printing out, scanning back in, doing OCR, and have an AI fix the OCR errors. ;-)
Also, no traces of "just use a regex" so far. Which is really good.
Alexander
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Module to extract text from HTML
by bliako (Abbot) on Feb 29, 2024 at 17:35 UTC |