in reply to Count Colour Pages in PDF

I presume that with "without open" you mean "without opening the file in a PDF viewer and eyeballing its page contents" (as others have pointed out, you'd of course have to open/read the file somehow in order to analyse its contents).  That said, I'm not aware of an easy way to solve the task with any of the existing PDF modules on CPAN. Even though there may be ways of identifying the color space being used, this doesn't necessarily help to detect if the page is using color — for example, it's rather common to draw nothing but black in the RGB color space...  In other words, you'd have to check the effective color of every single PDF drawing/imaging instruction, which would be pretty cumbersome.

Personally, I would approach the problem as follows:

the idea being that there is no difference, if the original image/page didn't have any color in the first place.

As usual, there are several ways to implement this. One way would be to use Ghostscript and a few of the good old Netpbm tools:

#!/bin/sh infile=$1 prefix=tmp-page rm -f $prefix*p?m # clean up # convert pages to raster images gs -sDEVICE=ppmraw -r30 -sOutputFile=$prefix%03d.ppm -dNOPAUSE -dBATCH + -q "$infile" for img in $prefix*.ppm ; do # for each page ppmtopgm $img > $img.pgm # convert to grayscale pgmtoppm '#fff' $img.pgm > $img.pgm.ppm # convert back to RGB pnmpsnr $img $img.pgm.ppm # diff done

With the following sample input1 consisting of three pages (two pages in black&white/gray, one page in color)

%!PS /Helvetica findfont 50 scalefont setfont /text (PerlMonks rocks!) def % page 1 - black 100 500 moveto text show 100 400 moveto text show showpage % page 2 - gray 0.5 setgray 100 500 moveto text show 100 400 moveto text show showpage % page 3 - color (black and red) 0 setgray 100 500 moveto text show 1 0 0 setrgbcolor % red 100 400 moveto text show showpage

you'd get this output (from the diff-ing tool, pnmpsnr):

pnmpsnr: PSNR between tmp-page001.ppm and tmp-page001.ppm.pgm.ppm: pnmpsnr: Y color component doesn't differ. pnmpsnr: Cb color component doesn't differ. pnmpsnr: Cr color component doesn't differ. pnmpsnr: PSNR between tmp-page002.ppm and tmp-page002.ppm.pgm.ppm: pnmpsnr: Y color component doesn't differ. pnmpsnr: Cb color component doesn't differ. pnmpsnr: Cr color component doesn't differ. pnmpsnr: PSNR between tmp-page003.ppm and tmp-page003.ppm.pgm.ppm: pnmpsnr: Y color component: 81.71 dB pnmpsnr: Cb color component: 35.86 dB pnmpsnr: Cr color component: 26.43 dB

"doesn't differ" in both the luminance and color components (see YCbCr) in this case means that there was no color on the respective page.

I'll leave it as an exercise for the reader to write a little Perl wrapper that parses this output (and, optionally, rewrite the above shell script in Perl).

___

1 I'm using PostScript input here (for brevity) — PDF should work, too, of course (if you don't believe me, you can convert the sample input to PDF using the ps2pdf tool that comes with gs :)