Also, I've been hunting around trying to find out exactly what should be extracted from the JPEG to use in the PDF and can't find too much there either.

Yeah, you're right, Adobe's Reference Documentation isn't exactly clear on what a stream appropriate for the DCTDecode filter (see p. 60) would look like. However, trial-and-error shows that JFIF (JPEG File Interchange Format -- that's what regular jpeg files are) is fine, apparently. IOW, you don't need to extract anything from the jpeg file, just copy the file as is (including the header stuff) into the stream section of your PDF object declaration...

For example, an image object declaration could look like

1 0 obj << /Type /XObject /Subtype /Image /Width $WIDTH /Height $HEIGHT /ColorSpace /DeviceRGB /BitsPerComponent 8 /Length $STREAMSIZE_IN_BYTES /Filter /DCTDecode >> stream ... entire jpeg file contents here ... endstream endobj

(Replace $WIDTH, $HEIGHT, $STREAMSIZE_IN_BYTES with the appropriate values, of course. The ColorSpace and BitsPerComponent settings (as shown) should be fine for most typical color jpeg files)

Well, I guess, I'll leave it at that for the moment, because I'm not sure at all if that's what you were asking... ;) -- In any case, if you want me to elaborate on this rather low level approach, just say so... (also, I could put up a minimal working example somewhere, containing nothing but the above object plus the absolutely essential boiler plate -- so you can more easily examine the details in context).

Having said that, I'd like to point out that - as suggested by other monks - a more high-level approach, using CPAN modules, is almost certainly the way to go. Except if you really want to do it yourself from scratch, for the learning experience or whatever.

Actually, it might make some sense, if you'd like to just modify your existing converter tool to directly output the additional PDF code while creating the PDF in the first place... Manually inserting an image into an existing PDF file is quite a PITA -- mainly because all objects in a PDF file are indexed via some lookup table containing the objects' byte offsets in the file. As soon as you begin to shift around objects (e.g. by inserting a new one), you have to adjust all indices...


In reply to Re^3: PDF and Image Insertion by almut
in thread PDF and Image Insertion by rpike

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.