One last thought. If you are still hankering for more speed, compile your own PDL modules and are comfortable with C, then you could look to making a custom version of the ccNcompt() routine in PDL::Image2D, that accumulates the component bounds in the same pass as it discovers them.

On the basis of a quick look at the source it wouldn't be too hard to modify the routine to do the accumulation; though there are a few complications.

  1. Merging bounds when merging equivalences.

    The way their algorithm works, different parts of a single component can be labeled with different values as the scan progresses, and these aliases are then resolved at the end.

    You would need to merge the bounds of the aliased parts at that same time.

  2. The PDL source code environment is quite complex.

    In as much as it appears that the C source code is actually embedded as strings within Perl source code and subject to some kind of templating mechanism prior to those Perl sources being executed to generate the C sources which are then compiled.

    I've worked on a similar source generation mechanism in the past and a) it can be very difficult to understand how the different phases work together; b) it can be a nightmare to debug.

  3. Arranging to return the bounds information to the caller.

    It was not obvious to me how (or even if it is possible) to return two separate data structures -- the existing 'colored' image and the require AoAs bounds data -- from the routine.

    If your need is such that this idea is attractive, then you would definitely need the help of the PDL devs.

Thanks for posting such an interesting problem. It has given me much mental stimulation of the last week or so.

One final, final thought. Many years ago, I wrote some OCR routines in 6502 assembler for a BBC micro. The first part of that process was to isolate the individual characters, essentially the same task as this. However, I was lucky in as much as the stuff I was dealing with was handwritten text and crosses filled into predefined boxes on a form -- multiple choice question papers with boxes for choices, names, id numbers etc. and the positions of those boxes were known a priori to a high degree of accuracy. That made my life simple -- for the first stage at least.

It is clear from your sample image that you're not dealing with simple text, but there also appear to be registration marks on the image. If that is true for all your samples, and they are consistent, it might be possible to predefine the areas of interest, rather than needing to discover them new for each image, which would greatly speed up your task.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^9: Why this code is so slow if run in thread? by BrowserUk
in thread Why this code is so slow if run in thread? by vr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.