Re^9: Why this code is so slow if run in thread?

One last thought. If you are still hankering for more speed, compile your own PDL modules and are comfortable with C, then you could look to making a custom version of the ccNcompt() routine in PDL::Image2D, that accumulates the component bounds in the same pass as it discovers them.

On the basis of a quick look at the source it wouldn't be too hard to modify the routine to do the accumulation; though there are a few complications.

Merging bounds when merging equivalences.
The way their algorithm works, different parts of a single component can be labeled with different values as the scan progresses, and these aliases are then resolved at the end.
You would need to merge the bounds of the aliased parts at that same time.
The PDL source code environment is quite complex.
In as much as it appears that the C source code is actually embedded as strings within Perl source code and subject to some kind of templating mechanism prior to those Perl sources being executed to generate the C sources which are then compiled.
I've worked on a similar source generation mechanism in the past and a) it can be very difficult to understand how the different phases work together; b) it can be a nightmare to debug.
Arranging to return the bounds information to the caller.
It was not obvious to me how (or even if it is possible) to return two separate data structures -- the existing 'colored' image and the require AoAs bounds data -- from the routine.
If your need is such that this idea is attractive, then you would definitely need the help of the PDL devs.

Thanks for posting such an interesting problem. It has given me much mental stimulation of the last week or so.

One final, final thought. Many years ago, I wrote some OCR routines in 6502 assembler for a BBC micro. The first part of that process was to isolate the individual characters, essentially the same task as this. However, I was lucky in as much as the stuff I was dealing with was handwritten text and crosses filled into predefined boxes on a form -- multiple choice question papers with boxes for choices, names, id numbers etc. and the positions of those boxes were known a priori to a high degree of accuracy. That made my life simple -- for the first stage at least.

It is clear from your sample image that you're not dealing with simple text, but there also appear to be registration marks on the image. If that is true for all your samples, and they are consistent, it might be possible to predefine the areas of interest, rather than needing to discover them new for each image, which would greatly speed up your task.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^9: Why this code is so slow if run in thread?

Replies are listed 'Best First'.
Re^10: Why this code is so slow if run in thread? by vr (Curate) on Dec 16, 2016 at 16:11 UTC
Thanks a lot for your thoughts, and I'm glad you liked the question :). The speed as it is now, maybe up to 5 sec per image, is _quite_ OK. So much time, because application scans 4 narrow edges of large image (to be 100% sure), in both directions (orientations), i.e. 8 partial scans per image, -- very easy to do with PDL slicing and "get_dataref" -- they do all rotation and cropping. Text is, indeed, simple, but its size, position, etc. are unknown. Spending time on PDL source (for which I'm not qualified enough, anyway), will be an overkill. It was just that, BEFORE knowing why it was so slow, I was a little frustrated that 2 workers (no more, because they also do some other stuff and can consume lots of memory) i.e. threads (max_workers being set to 2 for my descendant of IO::Async::Function, in IO::Async application), each one processing just a single file, blocked the whole queue for _more than 40 minutes_. But, now it is OK.	[reply]
Re^10: Why this code is so slow if run in thread? by etj (Priest) on May 18, 2022 at 23:48 UTC
The easiest way to make a custom PDL::PP function is to use Inline::Pdlpp: see https://github.com/Fourmilab/floating_point_benchmarks/pull/1/files for a working example.	[reply]