in reply to PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"

I suspect this is one of those cases where vector operations don't buy you anything; indeed, they probably force a lot of extra work.

Once you have your "segmented image" -- ie. each object colored differently -- then a single left to right, top-to bottom pixel scan will discover the bounding boxes of all the objects.

In pseudo code:

use constant { LEFT=>0, RIGHT=>1, TOP=>2, BOTTOM=>3 }; my @bounds; for my $y ( 0 .. $height -1 ){ for my $x ( 0 .. $width - 1 ) { my $pixel = $image->pixel( $x, $y ); $bounds[ $pixel ][ LEFT ] = $x if $x < ( $bounds[ $pixel ][ +LEFT ] // 1e99 ); $bounds[ $pixel ][ RIGHT ] = $x if $x > ( $bounds[ $pixel ][ +RIGHT ] // 0 ); $bounds[ $pixel ][ TOP ] = $y if $y < ( $bounds[ $pixel ][ +TOP ] // 1e99 ); $bounds[ $pixel ][ BOTTOM ] = $y if $y > ( $bounds[ $pixel ][ +BOTTOM ] // 0 ); } }

Unfortunately I cannot help you with applying that to PDL because I gave up on trying to find an efficient way to apply a piece of perl code to all the elements of a piddle.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"
  • Download Code

Replies are listed 'Best First'.
Re^2: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"
by vr (Curate) on Nov 19, 2016 at 07:23 UTC

    You are right, I was hoping to stay (do all work) in PDL domain, i.e. maybe I overlooked function e.g. similar to CORE::index or even List::Util::first.

    Now my solution is this:

    # $s is our segmented image use integer; my ( $w, $h ) = $s-> dims; my $str = ${ $s-> byte-> get_dataref }; my @b = map { [ [ $w, 0 ], [ $h, 0 ] ] } 1 .. $s-> max; while ( $str =~ /[^\x00]/g ) { my $i = pos( $str ) - 1; my $x = $i % $w; my $y = $i / $w; my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; }

    It's fast, and will get slightly more complex for 2-byte "characters", if number of "objects" is more than 255.

    Btw, this single scan with regex engine is twice (or _only_ twice?) as fast as multiple scans with index and rindex (actually 4 scans per "object", on concatenated rows and concatenated columns).

    p.s. But, ehm, direct per-pixel scan (accessing $s-> at( $x, $y )) is 5 times slower than PDL-only solution.

      buk2() below is about 50% faster than buk(), but the best is buk3() which 36x faster than buk() and a cool 7000 times faster than yr(). (That astounded me, and I didn't believe it at first, but it's true!):

      C:\test>1176081 -WIDTH=1000 -HEIGHT=1000 yr() took 295.895124 buk() took 1.483303 buk2() took 1.113015 buk3() took 0.042507

      And a run on a 10kx10k "image" (without yr() as it would take days.):

      C:\test>1176081 -WIDTH=10000 -HEIGHT=10000 yr() took 0.000003 buk() took 150.481585 buk2() took 102.911382 buk3() took 3.710700

      The test is only a crude simulation, so things may not pan out quite so well with the real data, but it worth a look :)

      My test harness:

      #! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; use constant { LEFT=>0, RIGHT=>1, TOP=>2, BOTTOM=>3 }; our $WIDTH //= 1000; our $HEIGHT //= 1000; sub makeObj{ my( $img, $x, $y, $size, $c ) = @_; for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { return () unless substr( $$img, $y1*$WIDTH + $x-(($$size+1)/2) +, $$size ) = chr(0)x($$size); } for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { substr( $$img, $y1 * $WIDTH + $x-(($$size+1)/2), $$size+2 ) = +$c x($$size+2); } return 1; } sub yr { # use integer; ## using int() below seemed faster than this. my $str = shift; my @b = map { [ [ $WIDTH, 0 ], [ $HEIGHT, 0 ] ] } 1 .. 256; #$s-> +max; while ( $$str =~ /[^\x00]/g ) { my $i = pos( $$str ) - 1; my $x = $i % $WIDTH; my $y = int( $i / $WIDTH ); my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; } return \@b; } sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C*', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; } sub buk2{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; for my $c ( unpack'C*', substr $$str, $y * $WIDTH, $WIDTH ) { $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } sub buk3{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; while( substr( $$str, $y * $WIDTH, $WIDTH ) =~ m[(([^\0])+)]g +) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0] if $+[0] > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } my $pdl = chr(0); $pdl x= ( $WIDTH * $HEIGHT ); my( $x, $y ) = ( $WIDTH/2, $HEIGHT/2 ); for my $c ( 1 .. 255 ) { my $size = 3 + rand( 200 ); my $sizeDiv2 = int( ( $size+1 ) / 2 ); do{ ( $x, $y ) = ( $sizeDiv2 + rand( $WIDTH - $size - 1 ), $sizeDi +v2 + rand( $HEIGHT - $size - 1 ) ) } until substr( $pdl, $y * $WIDTH + $x, 1 ) eq chr( 0 ); redo unless makeObj( \$pdl, $x, $y, \$size, chr( $c ) ); } my $start = time; my $yr = yr \$pdl; my $end = time; printf "yr() took %.6f\n", $end - $start; $start = time; my $buk = buk \$pdl; $end = time; printf "buk() took %.6f\n", $end - $start; $start = time; my $buk2 = buk2 \$pdl; $end = time; printf "buk2() took %.6f\n", $end - $start; $start = time; my $buk3 = buk3 \$pdl; $end = time; printf "buk3() took %.6f\n", $end - $start; #<STDIN>; #pp $buk; pp $buk3;

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Thanks a lot for your time and interest. I had to rewrite the loop in "buk3" like this:
        for my $y ( 0 .. $HEIGHT-1 ) { my $s = substr( $$str, $y * $WIDTH, $WIDTH ); while( $s =~ m[(([^\0])+)]g ) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0]-1 if $+[0]-1 > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; } }
        Otherwise it goes forever. We can't match globally against substr (as lvalue?), can we? When I run your code (5.24 on Windows), it says:
        yr() took 3.484375 buk() took 1.233143 buk2() took 0.670660 buk3() took 0.029448
        I.e. no hundreds of seconds, for my sub, at all. I would not otherwise publish it here and call it 'fast' :) Also, your test 'image' is something interesting, we can look at it if:
        PDL::IO::Image-> new_from_pdl( pdl([ unpack 'C*', $pdl ]) -> reshape( 1000, 1000 )-> bitnot )-> save( 'buk.png', 'PNG' );
        Not exactly representative as real life image. For typical real image it's this:
        PDL: Short D [7616,1200] max = 145 s/iter unpack buk2 regex buk3 unpack 3.02 -- -6% -36% -88% buk2 2.84 6% -- -32% -87% regex 1.92 57% 48% -- -81% buk3 0.362 733% 684% 430% --
        Anyway, your "buk3" algorithm is fastest.

      You might try this. It expects a reference to the raw string, and the width & height available in globals -- adapt to your preferences:

      sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; #$s-> max; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C*', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; }

      In my simulated tests, it performs 224 times faster than your posted code on a 1000x1000x1byte image and 1700 times faster on a 2000x2000:

      C:\test>1176081 -WIDTH=999 -HEIGHT=999 yr() took 329.728896 buk() took 1.474515 C:\test>1176081 -WIDTH=2000 -HEIGHT=2000 yr() took 10265.728694 buk() took 5.976180

      It is also trivially adaptable to images that use short/word/doubleword/quadword sized pixels.

      I have a couple of ideas for a faster version, which I'll post if they pan out.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.