The problem is that there are several versions of the PDF format (from 1.0 to 1.7). Over the years, many extensions have been introduced, and some of the newer ones are not supported by CAM::PDF. One of them (apparently) is compressed xref tables — the xref table is a list of byte offsets pointing to where the individual objects are stored within the file, which in older versions was always uncompressed. This new feature is being used in the sample PDF file you linked to (which is PDF-1.6).
You can often work around such problems by using another tool to change the internal format of the PDF file. qpdf is a pretty good one, which provides quite a number of options to play with. For example, you could try:
$ qpdf --stream-data=uncompress in.pdf out.pdf
(and optionally re-compress it with --stream-data=compress, if size matters)
After applying this procedure to the PDF in question, the converted file(s) could successfully be read by CAM::PDF.
In reply to Re: Converting Text from PDF using CAM::PDF
by almut
in thread Converting Text from PDF using CAM::PDF
by mr_p
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |