sureshrps has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Count Colour Pages in PDF
by Corion (Patriarch) on Nov 09, 2009 at 09:52 UTC | |
What do you mean by "without opening the pdf"? You will need to use the open function to read the file. If you don't want to "open a window for the user", that's likely possible. A cursory glance through PDF::API2 doesn't tell anything, so maybe you'll have to inspect all page objects and check whether they are an image ("likely" colored) or using a non-grayscale color. | [reply] |
|
Re: Count Colour Pages in PDF
by marto (Cardinal) on Nov 09, 2009 at 09:56 UTC | |
I seriously doubt you will find a way to examine the contents of a PDF file without actualy opening the PDF in some manner. That isn't going to happen. Are you counting pages which contain colour text/images or do these PDF files consist of one image per page? Perhaps PDF::API2s colourspace methods may be of use. Martin | [reply] |
|
Re: Count Colour Pages in PDF
by almut (Canon) on Nov 09, 2009 at 17:40 UTC | |
I presume that with "without open" you mean "without opening the file in a PDF viewer and eyeballing its page contents" (as others have pointed out, you'd of course have to open/read the file somehow in order to analyse its contents). That said, I'm not aware of an easy way to solve the task with any of the existing PDF modules on CPAN. Even though there may be ways of identifying the color space being used, this doesn't necessarily help to detect if the page is using color — for example, it's rather common to draw nothing but black in the RGB color space... In other words, you'd have to check the effective color of every single PDF drawing/imaging instruction, which would be pretty cumbersome. Personally, I would approach the problem as follows: the idea being that there is no difference, if the original image/page didn't have any color in the first place. As usual, there are several ways to implement this. One way would be to use Ghostscript and a few of the good old Netpbm tools:
With the following sample input1 consisting of three pages (two pages in black&white/gray, one page in color)
you'd get this output (from the diff-ing tool, pnmpsnr):
"doesn't differ" in both the luminance and color components (see YCbCr) in this case means that there was no color on the respective page. I'll leave it as an exercise for the reader to write a little Perl wrapper that parses this output (and, optionally, rewrite the above shell script in Perl). ___ 1 I'm using PostScript input here (for brevity) — PDF should work, too, of course (if you don't believe me, you can convert the sample input to PDF using the ps2pdf tool that comes with gs :) | [reply] [d/l] [select] |