Also, I've been hunting around trying to find out exactly what should
be extracted from the JPEG to use in the PDF and can't find too much
there either.
Yeah, you're right, Adobe's Reference Documentation
isn't exactly clear on what a stream appropriate for the DCTDecode
filter (see p. 60) would look like. However, trial-and-error shows that
JFIF (JPEG File Interchange Format -- that's what regular jpeg files are) is fine, apparently. IOW, you don't need to extract anything
from the jpeg file, just copy the file as is (including the
header stuff) into the stream section of your PDF object declaration...
For example, an image object declaration could look like
1 0 obj <<
/Type /XObject /Subtype /Image
/Width $WIDTH
/Height $HEIGHT
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Length $STREAMSIZE_IN_BYTES
/Filter /DCTDecode
>> stream
... entire jpeg file contents here ...
endstream
endobj
(Replace $WIDTH, $HEIGHT, $STREAMSIZE_IN_BYTES with the appropriate
values, of course. The ColorSpace and BitsPerComponent settings (as
shown) should be fine for most typical color jpeg files)
Well, I guess, I'll leave it at that for the moment, because I'm not
sure at all if that's what you were asking... ;) -- In any case, if you
want me to elaborate on this rather low level approach, just say so...
(also, I could put up a minimal working example somewhere,
containing nothing but the above object plus the absolutely essential
boiler plate -- so you can more easily examine the details in
context).
Having said that, I'd like to point out that - as suggested by other
monks - a more high-level approach, using CPAN modules, is almost
certainly the way to go. Except if you really want to do it yourself
from scratch, for the learning experience or whatever.
Actually, it might make some sense, if you'd like to just
modify your existing converter tool to directly output the additional
PDF code while creating the PDF in the first place... Manually
inserting an image into an existing PDF file is quite a PITA -- mainly
because all objects in a PDF file are indexed via some lookup table
containing the objects' byte offsets in the file. As soon as you
begin to shift around objects (e.g. by inserting a new one), you have
to adjust all indices...
|