I'm not aware of any canned module for extracting black rectangles from PDFs.  I could be wrong of course, but I'm afraid you'll have to do some low-level coding yourself, similar in spirit to what you've already tried, i.e. searching the content streams for re commands, or certain combinations of line+fill operators.

Personally, when it comes to low-level messing with PDFs, I'm a big fan of pdftk, as it allows easy uncompressing of the PDF's content streams. For example, doing the following

$ pdftk GBA.pdf output - uncompress | grep ' re$' | wc 4730 23650 132978

counts 4730 rectangle drawing instructions (though they may of course not all qualify as redaction rectangles...)

Anyhow, what's the idea behind extracting those rectangles, i.e. what are you really trying to do? Maybe there is some entirely different approach to solving your problem.


In reply to Re: Finding Rectangles in PDFs by almut
in thread Finding Rectangles in PDFs by binarybits

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.