in reply to Re^3: read pdf text in hidden layer?
in thread read pdf text in hidden layer?
Wow thanks, this made a big difference. I didn't see these examples mentioned in the CAM::PDF doc- These are really useful utilities.
I still hold to what I said earlier. I think docs/errors need more 'basic stuff'.
I have a pdf I open and I get this error 'Expected stream open tag' . It's a die(), with no tangible info- for example, what method died? What module? ( It's from CAM::PDF::parseStream(), I had to find xargs grep for it- that's fine- I can do that- but not everyone should have to. ) - Some Carp::confess() would be nice here.
pdftotext from xpdf works fine with that same file CAM::PDF choaks on. Dunno why. The PDF may be corrupt.
Thank you for pointing these out!
update
It seems CAM::PDF::parseStream() expects a pdf stream tag to be followed by a newline.. Some of these files going between NTFS/ext3 seem to have funnied up with the newline .. (you know the old story with ftp binmode.. )- so maybe xpdf allows for ^M to be a newline, but not CAM::PDF? Maybe I'll write author.
|
|---|