Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Merging images into the content of a PDF document

by vr (Curate)
on May 22, 2018 at 16:21 UTC ( [id://1215049]=note: print w/replies, xml ) Need Help??


in reply to Merging images into the content of a PDF document

use strict; use warnings; use feature 'say'; use CAM::PDF; my $fn = 'inlineimage.pdf'; my $pdf = CAM::PDF-> new( $fn ) or die; my $pagenum = 1; my $content = $pdf-> getPageContent( $pagenum ); # say $content; # exit; $content =~ s{ (?<= \s ) ( /\S+ ) \s+ Do (?= \s ) }{ my $obj = $pdf-> dereference( $1, $pagenum ); delete $obj-> { value }{ value }{ Length }; $pdf-> writeInlineImage( $obj ); }gxse; delete $pdf-> getPage( $pagenum ) -> { Resources }{ value }{ XObject }; $pdf-> cleanse; $pdf-> setPageContent( 1, $content ); $fn =~ s/\.pdf$/+$&/i; $pdf-> cleanoutput( $fn );

I didn't know inlining images prevents them to be selected in e.g. Reader, thanks. This protection won't help much, though, because any tool which claims to optimize a PDF will attempt to un-inline them.

But, yes, there is a way, with quite a few traps. And you won't move anywhere w/o consulting the manuals.

Test subject is part of CAM::PDF test suite. Uncomment 2 lines and examine output first.

1st image is shown with "Do" operator, its argument is name of a resource. 2nd image is inline, whatever bracketed between "BI" and "EI" keywords. The "writeInlineImage" method doesn't write anything to anywhere, but, given name of a resource, returns a string to be inserted in content as inline version.

String replacement is very crude approach (parsing to content tree is advised instead) -- sequence may happen to be part of actual text content or binary (another inline image, whatever).

Unsupported inline image dictionary entries are supposed to be ignored. Why do I care to remove the "Length"? It happens, in this very file, to be not just a number (as /Length 45) but indirect object (as /Length 10 0 R). So what? Indirect objects are not allowed in content (syntax unknown, kind of), the "/Length 10" (key-value) is ignored, but "0 R" is unknown to parser and Reader just stops rendering the page. Supposedly, the "writeInlineImage" should have taken care of that.

Which leads to further quick and dirty fix, of removing "XObject" entry from resources, or otherwise CAM::PDF issues a warning about missing "Length" in stream dictionary. But it was a good thing to do, filesize-wise, as image resources are no longer required anyway.

Now both images are un-selectable, as required. See the Reference for further limitations of inlining, you will no doubt encounter them, -- considering even for an extremely simple test case there were some already.

Replies are listed 'Best First'.
Re^2: Merging images into the content of a PDF document
by Arik123 (Beadle) on May 23, 2018 at 07:04 UTC

    Thanks a lot... this solution works... to an extent.

    The images really seem to be inlined. That is, say $content really outputs the binary stream as it should. However, when the PDF is viewed in Reader, no images are shown. It's as if they're drawn in white color on white background... completely invisible, but I'm sure they're there.

    Any idea what could cause the problem?

      It's not "white color on white background", it's emptiness -- syntax error in content makes Reader to abort rendering (silently, because users are not be alarmed, no-no). Solution works to extent of file it was tested with (since you didn't provide any). And, like I said: what entries are allowed in inline image description, what values are they allowed to have, did you check the manual? SMask is definitely not allowed. Rather, it would be ignored, if it wasn't indirect object. I suspect same as above: "0 R" between BI and ID keywords, but maybe forbidden colorspace or compression, whatever. Arbitrarily deleting soft mask will probably result in change of appearance.

      And it's definitely not Perl anymore :)

Re^2: Merging images into the content of a PDF document
by Arik123 (Beadle) on May 23, 2018 at 08:05 UTC

    A small update - I played a bit with it, and deleted the SMask (right after you delete Length). Now the Chrome plugin shows the images, but Reader still doesn't. That's unfortunate, since I must use Reader...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1215049]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (1)
As of 2024-04-25 00:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found