Re^2: PDF and Image Insertion

Some of the people I've talked to, that seem to know CPAN pretty well, have given me the impression it isn't flexible enough to do what I'm trying to get it to do. Also, I've been hunting around trying to find out exactly what should be extracted from the JPEG to use in the PDF and can't find too much there either. I'm working with some information I've found from a small doc on the web that is helping slightly in the extraction of data from the JPEG. Hopefully that will lead to more info and getting the darn thing up and running. I find Adobe's documentation boring and clogged down with crap but I have referenced it a bunch of times.

Comment on Re^2: PDF and Image Insertion

Replies are listed 'Best First'.
Re^3: PDF and Image Insertion by almut (Canon) on Nov 27, 2006 at 19:54 UTC
Also, I've been hunting around trying to find out exactly what should be extracted from the JPEG to use in the PDF and can't find too much there either. Yeah, you're right, Adobe's Reference Documentation isn't exactly clear on what a stream appropriate for the DCTDecode filter (see p. 60) would look like. However, trial-and-error shows that JFIF (JPEG File Interchange Format -- that's what regular jpeg files are) is fine, apparently. IOW, you don't need to extract anything from the jpeg file, just copy the file as is (including the header stuff) into the stream section of your PDF object declaration... For example, an image object declaration could look like `1 0 obj << /Type /XObject /Subtype /Image /Width $WIDTH /Height $HEIGHT /ColorSpace /DeviceRGB /BitsPerComponent 8 /Length $STREAMSIZE_IN_BYTES /Filter /DCTDecode >> stream ... entire jpeg file contents here ... endstream endobj` [download] (Replace $WIDTH, $HEIGHT, $STREAMSIZE_IN_BYTES with the appropriate values, of course. The ColorSpace and BitsPerComponent settings (as shown) should be fine for most typical color jpeg files) Well, I guess, I'll leave it at that for the moment, because I'm not sure at all if that's what you were asking... ;) -- In any case, if you want me to elaborate on this rather low level approach, just say so... (also, I could put up a minimal working example somewhere, containing nothing but the above object plus the absolutely essential boiler plate -- so you can more easily examine the details in context). Having said that, I'd like to point out that - as suggested by other monks - a more high-level approach, using CPAN modules, is almost certainly the way to go. Except if you really want to do it yourself from scratch, for the learning experience or whatever. Actually, it might make some sense, if you'd like to just modify your existing converter tool to directly output the additional PDF code while creating the PDF in the first place... Manually inserting an image into an existing PDF file is quite a PITA -- mainly because all objects in a PDF file are indexed via some lookup table containing the objects' byte offsets in the file. As soon as you begin to shift around objects (e.g. by inserting a new one), you have to adjust all indices...	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: PDF and Image Insertion
by almut (Canon) on Nov 27, 2006 at 19:54 UTC

Also, I've been hunting around trying to find out exactly what should be extracted from the JPEG to use in the PDF and can't find too much there either.

Yeah, you're right, Adobe's Reference Documentation isn't exactly clear on what a stream appropriate for the DCTDecode filter (see p. 60) would look like. However, trial-and-error shows that JFIF (JPEG File Interchange Format -- that's what regular jpeg files are) is fine, apparently. IOW, you don't need to extract anything from the jpeg file, just copy the file as is (including the header stuff) into the stream section of your PDF object declaration...

For example, an image object declaration could look like

1 0 obj <<
  /Type /XObject /Subtype /Image
  /Width $WIDTH
  /Height $HEIGHT
  /ColorSpace /DeviceRGB
  /BitsPerComponent 8
  /Length $STREAMSIZE_IN_BYTES
  /Filter /DCTDecode
>> stream
... entire jpeg file contents here ...
endstream
endobj
[download]

(Replace $WIDTH, $HEIGHT, $STREAMSIZE_IN_BYTES with the appropriate values, of course. The ColorSpace and BitsPerComponent settings (as shown) should be fine for most typical color jpeg files)

Well, I guess, I'll leave it at that for the moment, because I'm not sure at all if that's what you were asking... ;) -- In any case, if you want me to elaborate on this rather low level approach, just say so... (also, I could put up a minimal working example somewhere, containing nothing but the above object plus the absolutely essential boiler plate -- so you can more easily examine the details in context).

Having said that, I'd like to point out that - as suggested by other monks - a more high-level approach, using CPAN modules, is almost certainly the way to go. Except if you really want to do it yourself from scratch, for the learning experience or whatever.

Actually, it might make some sense, if you'd like to just modify your existing converter tool to directly output the additional PDF code while creating the PDF in the first place... Manually inserting an image into an existing PDF file is quite a PITA -- mainly because all objects in a PDF file are indexed via some lookup table containing the objects' byte offsets in the file. As soon as you begin to shift around objects (e.g. by inserting a new one), you have to adjust all indices...

[reply]
[d/l]