Hi, I did something like this by first converting the PDF to text using the tool pdftotext, which gets decent output. There is also pdftohtml, which does HTML. That might be easier to parse. I'm not sure what info pdftohtml saves that pdftotext strips, but I assume there's a difference.
BTW, both tools are available for *nix and Windows.
Cheerio,
--
Allolex
In reply to Re: Reading PDF files
by allolex
in thread Reading PDF files
by Helter
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |