It's in the nature of PDF that text isn't represented by a sequence of letters, but that each letter may be positioned in the document separately; the order of the letters/words inside the .pdf-file has to be in no relation to the order the text appears on the screen.
This makes parsing .pdf-files extremely difficult.
I used a program called pdftext.exe (which works quite well extracting whole words (at least in most cases)) and post-processed the result with perl.
maybe its worth a try for you also...