comment on

The problem is that there are several versions of the PDF format (from 1.0 to 1.7). Over the years, many extensions have been introduced, and some of the newer ones are not supported by CAM::PDF. One of them (apparently) is compressed xref tables — the xref table is a list of byte offsets pointing to where the individual objects are stored within the file, which in older versions was always uncompressed. This new feature is being used in the sample PDF file you linked to (which is PDF-1.6).

You can often work around such problems by using another tool to change the internal format of the PDF file. qpdf is a pretty good one, which provides quite a number of options to play with. For example, you could try:

$ qpdf --stream-data=uncompress  in.pdf out.pdf
[download]

(and optionally re-compress it with --stream-data=compress, if size matters)

After applying this procedure to the PDF in question, the converted file(s) could successfully be read by CAM::PDF.

In reply to Re: Converting Text from PDF using CAM::PDF by almut
in thread Converting Text from PDF using CAM::PDF by mr_p

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.