Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
use strict; use warnings; use feature 'say'; use CAM::PDF; my $fn = 'inlineimage.pdf'; my $pdf = CAM::PDF-> new( $fn ) or die; my $pagenum = 1; my $content = $pdf-> getPageContent( $pagenum ); # say $content; # exit; $content =~ s{ (?<= \s ) ( /\S+ ) \s+ Do (?= \s ) }{ my $obj = $pdf-> dereference( $1, $pagenum ); delete $obj-> { value }{ value }{ Length }; $pdf-> writeInlineImage( $obj ); }gxse; delete $pdf-> getPage( $pagenum ) -> { Resources }{ value }{ XObject }; $pdf-> cleanse; $pdf-> setPageContent( 1, $content ); $fn =~ s/\.pdf$/+$&/i; $pdf-> cleanoutput( $fn );

I didn't know inlining images prevents them to be selected in e.g. Reader, thanks. This protection won't help much, though, because any tool which claims to optimize a PDF will attempt to un-inline them.

But, yes, there is a way, with quite a few traps. And you won't move anywhere w/o consulting the manuals.

Test subject is part of CAM::PDF test suite. Uncomment 2 lines and examine output first.

1st image is shown with "Do" operator, its argument is name of a resource. 2nd image is inline, whatever bracketed between "BI" and "EI" keywords. The "writeInlineImage" method doesn't write anything to anywhere, but, given name of a resource, returns a string to be inserted in content as inline version.

String replacement is very crude approach (parsing to content tree is advised instead) -- sequence may happen to be part of actual text content or binary (another inline image, whatever).

Unsupported inline image dictionary entries are supposed to be ignored. Why do I care to remove the "Length"? It happens, in this very file, to be not just a number (as /Length 45) but indirect object (as /Length 10 0 R). So what? Indirect objects are not allowed in content (syntax unknown, kind of), the "/Length 10" (key-value) is ignored, but "0 R" is unknown to parser and Reader just stops rendering the page. Supposedly, the "writeInlineImage" should have taken care of that.

Which leads to further quick and dirty fix, of removing "XObject" entry from resources, or otherwise CAM::PDF issues a warning about missing "Length" in stream dictionary. But it was a good thing to do, filesize-wise, as image resources are no longer required anyway.

Now both images are un-selectable, as required. See the Reference for further limitations of inlining, you will no doubt encounter them, -- considering even for an extremely simple test case there were some already.


In reply to Re: Merging images into the content of a PDF document by vr
in thread Merging images into the content of a PDF document by Arik123

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-19 21:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found