Thanks again for your reply. Let me clarify a bit. Since I can read the documents in the browser I know they contain only text so OCR is not an issue. All the documents follow a similar set of templates but the content changes for each. I have viewed hundreds of these and any document that does not conform will be skipped.
Your comments on downloading and then using a pdftotext tool on the local file is inline with my current thinking as long as it can be scripted and run without intervention. Are there any other suggestions I should examine?
In reply to Re^4: Mechanize Firefox text Method
by halweitz
in thread Mechanize Firefox text Method
by halweitz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |