Re: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"

Replies are listed 'Best First'.
Re^2: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" by vr (Curate) on Nov 19, 2016 at 07:23 UTC
You are right, I was hoping to stay (do all work) in PDL domain, i.e. maybe I overlooked function e.g. similar to CORE::index or even List::Util::first. Now my solution is this: `# $s is our segmented image use integer; my ( $w, $h ) = $s-> dims; my $str = ${ $s-> byte-> get_dataref }; my @b = map { [ [ $w, 0 ], [ $h, 0 ] ] } 1 .. $s-> max; while ( $str =~ /[^\x00]/g ) { my $i = pos( $str ) - 1; my $x = $i % $w; my $y = $i / $w; my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; }` [download] It's fast, and will get slightly more complex for 2-byte "characters", if number of "objects" is more than 255. Btw, this single scan with regex engine is twice (or _only_ twice?) as fast as multiple scans with index and rindex (actually 4 scans per "object", on concatenated rows and concatenated columns). p.s. But, ehm, direct per-pixel scan (accessing $s-> at( $x, $y )) is 5 times slower than PDL-only solution.	[reply] [d/l]
Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by BrowserUk (Patriarch) on Nov 20, 2016 at 13:57 UTC
buk2() below is about 50% faster than buk(), but the best is buk3() which 36x faster than buk() and a cool 7000 times faster than yr(). (That astounded me, and I didn't believe it at first, but it's true!): `C:\test>1176081 -WIDTH=1000 -HEIGHT=1000 yr() took 295.895124 buk() took 1.483303 buk2() took 1.113015 buk3() took 0.042507` [download] And a run on a 10kx10k "image" (without yr() as it would take days.): `C:\test>1176081 -WIDTH=10000 -HEIGHT=10000 yr() took 0.000003 buk() took 150.481585 buk2() took 102.911382 buk3() took 3.710700` [download] The test is only a crude simulation, so things may not pan out quite so well with the real data, but it worth a look :) My test harness: #! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; use constant { LEFT=>0, RIGHT=>1, TOP=>2, BOTTOM=>3 }; our $WIDTH //= 1000; our $HEIGHT //= 1000; sub makeObj{ my( $img, $x, $y, $size, $c ) = @_; for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { return () unless substr( $$img, $y1$WIDTH + $x-(($$size+1)/2) +, $$size ) = chr(0)x($$size); } for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { substr( $$img, $y1 $WIDTH + $x-(($$size+1)/2), $$size+2 ) = +$c x($$size+2); } return 1; } sub yr { # use integer; ## using int() below seemed faster than this. my $str = shift; my @b = map { [ [ $WIDTH, 0 ], [ $HEIGHT, 0 ] ] } 1 .. 256; #$s-> +max; while ( $$str =~ /[^\x00]/g ) { my $i = pos( $$str ) - 1; my $x = $i % $WIDTH; my $y = int( $i / $WIDTH ); my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; } return \@b; } sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; } sub buk2{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; for my $c ( unpack'C', substr $$str, $y * $WIDTH, $WIDTH ) { $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } sub buk3{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; while( substr( $$str, $y * $WIDTH, $WIDTH ) =~ m[(([^\0])+)]g +) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0] if $+[0] > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } my $pdl = chr(0); $pdl x= ( $WIDTH * $HEIGHT ); my( $x, $y ) = ( $WIDTH/2, $HEIGHT/2 ); for my $c ( 1 .. 255 ) { my $size = 3 + rand( 200 ); my $sizeDiv2 = int( ( $size+1 ) / 2 ); do{ ( $x, $y ) = ( $sizeDiv2 + rand( $WIDTH - $size - 1 ), $sizeDi +v2 + rand( $HEIGHT - $size - 1 ) ) } until substr( $pdl, $y * $WIDTH + $x, 1 ) eq chr( 0 ); redo unless makeObj( \$pdl, $x, $y, \$size, chr( $c ) ); } my $start = time; my $yr = yr \$pdl; my $end = time; printf "yr() took %.6f\n", $end - $start; $start = time; my $buk = buk \$pdl; $end = time; printf "buk() took %.6f\n", $end - $start; $start = time; my $buk2 = buk2 \$pdl; $end = time; printf "buk2() took %.6f\n", $end - $start; $start = time; my $buk3 = buk3 \$pdl; $end = time; printf "buk3() took %.6f\n", $end - $start; #<STDIN>; #pp $buk; pp $buk3; [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by vr (Curate) on Nov 21, 2016 at 12:44 UTC
Thanks a lot for your time and interest. I had to rewrite the loop in "buk3" like this: `for my $y ( 0 .. $HEIGHT-1 ) { my $s = substr( $$str, $y * $WIDTH, $WIDTH ); while( $s =~ m[(([^\0])+)]g ) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0]-1 if $+[0]-1 > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; } }` [download] Otherwise it goes forever. We can't match globally against substr (as lvalue?), can we? When I run your code (5.24 on Windows), it says: `yr() took 3.484375 buk() took 1.233143 buk2() took 0.670660 buk3() took 0.029448` [download] I.e. no hundreds of seconds, for my sub, at all. I would not otherwise publish it here and call it 'fast' :) Also, your test 'image' is something interesting, we can look at it if: `PDL::IO::Image-> new_from_pdl( pdl([ unpack 'C*', $pdl ]) -> reshape( 1000, 1000 )-> bitnot )-> save( 'buk.png', 'PNG' );` [download] Not exactly representative as real life image. For typical real image it's this: `PDL: Short D [7616,1200] max = 145 s/iter unpack buk2 regex buk3 unpack 3.02 -- -6% -36% -88% buk2 2.84 6% -- -32% -87% regex 1.92 57% 48% -- -81% buk3 0.362 733% 684% 430% --` [download] Anyway, your "buk3" algorithm is fastest.	[reply] [d/l] [select]
Re^5: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by BrowserUk (Patriarch) on Nov 21, 2016 at 13:44 UTC
Re^6: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by dbuckhal (Chaplain) on Nov 22, 2016 at 03:51 UTC
Some notes below your chosen depth have not been shown here
Re^5: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (FAQ6: $&) by BrowserUk (Patriarch) on Nov 22, 2016 at 09:19 UTC
Re^6: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (FAQ6: $&) by vr (Curate) on Nov 22, 2016 at 10:42 UTC
Some notes below your chosen depth have not been shown here
Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (1700xfaster) by BrowserUk (Patriarch) on Nov 20, 2016 at 10:39 UTC
You might try this. It expects a reference to the raw string, and the width & height available in globals -- adapt to your preferences: `sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; #$s-> max; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C*', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; }` [download] In my simulated tests, it performs 224 times faster than your posted code on a 1000x1000x1byte image and 1700 times faster on a 2000x2000: `C:\test>1176081 -WIDTH=999 -HEIGHT=999 yr() took 329.728896 buk() took 1.474515 C:\test>1176081 -WIDTH=2000 -HEIGHT=2000 yr() took 10265.728694 buk() took 5.976180` [download] It is also trivially adaptable to images that use short/word/doubleword/quadword sized pixels. I have a couple of ideas for a faster version, which I'll post if they pan out. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]