Re^2: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"

Replies are listed 'Best First'.
Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by BrowserUk (Patriarch) on Nov 20, 2016 at 13:57 UTC
buk2() below is about 50% faster than buk(), but the best is buk3() which 36x faster than buk() and a cool 7000 times faster than yr(). (That astounded me, and I didn't believe it at first, but it's true!): `C:\test>1176081 -WIDTH=1000 -HEIGHT=1000 yr() took 295.895124 buk() took 1.483303 buk2() took 1.113015 buk3() took 0.042507` [download] And a run on a 10kx10k "image" (without yr() as it would take days.): `C:\test>1176081 -WIDTH=10000 -HEIGHT=10000 yr() took 0.000003 buk() took 150.481585 buk2() took 102.911382 buk3() took 3.710700` [download] The test is only a crude simulation, so things may not pan out quite so well with the real data, but it worth a look :) My test harness: #! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; use constant { LEFT=>0, RIGHT=>1, TOP=>2, BOTTOM=>3 }; our $WIDTH //= 1000; our $HEIGHT //= 1000; sub makeObj{ my( $img, $x, $y, $size, $c ) = @_; for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { return () unless substr( $$img, $y1$WIDTH + $x-(($$size+1)/2) +, $$size ) = chr(0)x($$size); } for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { substr( $$img, $y1 $WIDTH + $x-(($$size+1)/2), $$size+2 ) = +$c x($$size+2); } return 1; } sub yr { # use integer; ## using int() below seemed faster than this. my $str = shift; my @b = map { [ [ $WIDTH, 0 ], [ $HEIGHT, 0 ] ] } 1 .. 256; #$s-> +max; while ( $$str =~ /[^\x00]/g ) { my $i = pos( $$str ) - 1; my $x = $i % $WIDTH; my $y = int( $i / $WIDTH ); my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; } return \@b; } sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; } sub buk2{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; for my $c ( unpack'C', substr $$str, $y * $WIDTH, $WIDTH ) { $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } sub buk3{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; while( substr( $$str, $y * $WIDTH, $WIDTH ) =~ m[(([^\0])+)]g +) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0] if $+[0] > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } my $pdl = chr(0); $pdl x= ( $WIDTH * $HEIGHT ); my( $x, $y ) = ( $WIDTH/2, $HEIGHT/2 ); for my $c ( 1 .. 255 ) { my $size = 3 + rand( 200 ); my $sizeDiv2 = int( ( $size+1 ) / 2 ); do{ ( $x, $y ) = ( $sizeDiv2 + rand( $WIDTH - $size - 1 ), $sizeDi +v2 + rand( $HEIGHT - $size - 1 ) ) } until substr( $pdl, $y * $WIDTH + $x, 1 ) eq chr( 0 ); redo unless makeObj( \$pdl, $x, $y, \$size, chr( $c ) ); } my $start = time; my $yr = yr \$pdl; my $end = time; printf "yr() took %.6f\n", $end - $start; $start = time; my $buk = buk \$pdl; $end = time; printf "buk() took %.6f\n", $end - $start; $start = time; my $buk2 = buk2 \$pdl; $end = time; printf "buk2() took %.6f\n", $end - $start; $start = time; my $buk3 = buk3 \$pdl; $end = time; printf "buk3() took %.6f\n", $end - $start; #<STDIN>; #pp $buk; pp $buk3; [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by vr (Curate) on Nov 21, 2016 at 12:44 UTC
Thanks a lot for your time and interest. I had to rewrite the loop in "buk3" like this: `for my $y ( 0 .. $HEIGHT-1 ) { my $s = substr( $$str, $y * $WIDTH, $WIDTH ); while( $s =~ m[(([^\0])+)]g ) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0]-1 if $+[0]-1 > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; } }` [download] Otherwise it goes forever. We can't match globally against substr (as lvalue?), can we? When I run your code (5.24 on Windows), it says: `yr() took 3.484375 buk() took 1.233143 buk2() took 0.670660 buk3() took 0.029448` [download] I.e. no hundreds of seconds, for my sub, at all. I would not otherwise publish it here and call it 'fast' :) Also, your test 'image' is something interesting, we can look at it if: `PDL::IO::Image-> new_from_pdl( pdl([ unpack 'C*', $pdl ]) -> reshape( 1000, 1000 )-> bitnot )-> save( 'buk.png', 'PNG' );` [download] Not exactly representative as real life image. For typical real image it's this: `PDL: Short D [7616,1200] max = 145 s/iter unpack buk2 regex buk3 unpack 3.02 -- -6% -36% -88% buk2 2.84 6% -- -32% -87% regex 1.92 57% 48% -- -81% buk3 0.362 733% 684% 430% --` [download] Anyway, your "buk3" algorithm is fastest.	[reply] [d/l] [select]
Re^5: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by BrowserUk (Patriarch) on Nov 21, 2016 at 13:44 UTC
Curiouser and curiouser! When I run the code under 5.10, I get the sort of timings I posted above: `C:\test>1176081 -WIDTH=1000 -HEIGHT=1000 yr() took 330.252300 buk() took 1.584859 buk2() took 1.277809 buk3() took 0.226701` [download] But if I run it under 5.22, your sub runs very much faster* and buk3() doesn't terminate at all: which is weird and indicates (IMO) a bug in the later versions (I have an idea about the cause of the slowness in 5.10; I'll need to think of a way to verify it.) Modifying buk3() along the lines of your modification, but taking an lvalue ref outside the while loop and using it within the loop, allows it to work again: sub buk3{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $ref = \substr( $$str, $y $WIDTH, $WIDTH ); while( $$ref =~ m[((.)\2*)]sg ) { my $c = ord($1); $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT + ]; $b[ $c ][ RIGHT ] = $+[0]-1 if $+[0]-1 > $b[ $c ][ RIGH +T ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP + ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTT +OM ]; } } return \@b; } C:\test>\perl22\bin\perl 1176081.pl -WIDTH=1000 -HEIGHT=1000 yr() took 2.438320 buk() took 1.068071 buk2() took 0.681810 buk3() took 0.160666 [download] Which is okay, but a strange difference. It kinda takes the steam out of my amazing speedup figures -- 15x instead of 7000x -- but the thrill is transitory anyway :) And now I can run your sub on a larger image, even that gain is far less: `C:\test>\perl22\bin\perl 1176081.pl -WIDTH=10000 -HEIGHT=10000 yr() took 12.714030 buk() took 110.152981 buk2() took 70.100982 buk3() took 8.158657` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^6: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by dbuckhal (Chaplain) on Nov 22, 2016 at 03:51 UTC
Re^7: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster) by BrowserUk (Patriarch) on Nov 22, 2016 at 08:39 UTC
Re^5: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (FAQ6: $&) by BrowserUk (Patriarch) on Nov 22, 2016 at 09:19 UTC
It turns out that the reason for the pathologic behaviour of your code under 5.10 is entirely down to your use of $&, as described in the FAQ since circa 1999 or before. Replacing `my $c = ord( $& ) - 1;` with `my $c = ord( substr $$str, $-[0], 1 ) - 1;` entirely eliminates the performance problem on 5.10.1. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^6: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (FAQ6: $&) by vr (Curate) on Nov 22, 2016 at 10:42 UTC
Re^7: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (FAQ6: $&) by BrowserUk (Patriarch) on Nov 22, 2016 at 11:19 UTC
Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (1700xfaster) by BrowserUk (Patriarch) on Nov 20, 2016 at 10:39 UTC
You might try this. It expects a reference to the raw string, and the width & height available in globals -- adapt to your preferences: `sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; #$s-> max; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C*', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; }` [download] In my simulated tests, it performs 224 times faster than your posted code on a 1000x1000x1byte image and 1700 times faster on a 2000x2000: `C:\test>1176081 -WIDTH=999 -HEIGHT=999 yr() took 329.728896 buk() took 1.474515 C:\test>1176081 -WIDTH=2000 -HEIGHT=2000 yr() took 10265.728694 buk() took 5.976180` [download] It is also trivially adaptable to images that use short/word/doubleword/quadword sized pixels. I have a couple of ideas for a faster version, which I'll post if they pan out. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]


Just another Perl shrine
	PerlMonks