in reply to Re^3: Mechanize Firefox text Method
in thread Mechanize Firefox text Method
Thanks again for your reply. Let me clarify a bit. Since I can read the documents in the browser I know they contain only text so OCR is not an issue. All the documents follow a similar set of templates but the content changes for each. I have viewed hundreds of these and any document that does not conform will be skipped.
Your comments on downloading and then using a pdftotext tool on the local file is inline with my current thinking as long as it can be scripted and run without intervention. Are there any other suggestions I should examine?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Mechanize Firefox text Method
by afoken (Chancellor) on May 05, 2013 at 18:27 UTC | |
by halweitz (Novice) on May 18, 2013 at 03:02 UTC |