Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^2: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"

by vr (Curate)
on Nov 19, 2016 at 07:23 UTC ( [id://1176127]=note: print w/replies, xml ) Need Help??


in reply to Re: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"
in thread PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"

You are right, I was hoping to stay (do all work) in PDL domain, i.e. maybe I overlooked function e.g. similar to CORE::index or even List::Util::first.

Now my solution is this:

# $s is our segmented image use integer; my ( $w, $h ) = $s-> dims; my $str = ${ $s-> byte-> get_dataref }; my @b = map { [ [ $w, 0 ], [ $h, 0 ] ] } 1 .. $s-> max; while ( $str =~ /[^\x00]/g ) { my $i = pos( $str ) - 1; my $x = $i % $w; my $y = $i / $w; my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; }

It's fast, and will get slightly more complex for 2-byte "characters", if number of "objects" is more than 255.

Btw, this single scan with regex engine is twice (or _only_ twice?) as fast as multiple scans with index and rindex (actually 4 scans per "object", on concatenated rows and concatenated columns).

p.s. But, ehm, direct per-pixel scan (accessing $s-> at( $x, $y )) is 5 times slower than PDL-only solution.
  • Comment on Re^2: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects"
  • Download Code

Replies are listed 'Best First'.
Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster)
by BrowserUk (Patriarch) on Nov 20, 2016 at 13:57 UTC

    buk2() below is about 50% faster than buk(), but the best is buk3() which 36x faster than buk() and a cool 7000 times faster than yr(). (That astounded me, and I didn't believe it at first, but it's true!):

    C:\test>1176081 -WIDTH=1000 -HEIGHT=1000 yr() took 295.895124 buk() took 1.483303 buk2() took 1.113015 buk3() took 0.042507

    And a run on a 10kx10k "image" (without yr() as it would take days.):

    C:\test>1176081 -WIDTH=10000 -HEIGHT=10000 yr() took 0.000003 buk() took 150.481585 buk2() took 102.911382 buk3() took 3.710700

    The test is only a crude simulation, so things may not pan out quite so well with the real data, but it worth a look :)

    My test harness:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; use constant { LEFT=>0, RIGHT=>1, TOP=>2, BOTTOM=>3 }; our $WIDTH //= 1000; our $HEIGHT //= 1000; sub makeObj{ my( $img, $x, $y, $size, $c ) = @_; for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { return () unless substr( $$img, $y1*$WIDTH + $x-(($$size+1)/2) +, $$size ) = chr(0)x($$size); } for my $y1 ( $y - ( $$size / 2 ) .. $y + ( $$size / 2 ) ) { substr( $$img, $y1 * $WIDTH + $x-(($$size+1)/2), $$size+2 ) = +$c x($$size+2); } return 1; } sub yr { # use integer; ## using int() below seemed faster than this. my $str = shift; my @b = map { [ [ $WIDTH, 0 ], [ $HEIGHT, 0 ] ] } 1 .. 256; #$s-> +max; while ( $$str =~ /[^\x00]/g ) { my $i = pos( $$str ) - 1; my $x = $i % $WIDTH; my $y = int( $i / $WIDTH ); my $c = ord( $& ) - 1; $b[ $c ][ 0 ][ 0 ] = $x if $x < $b[ $c ][ 0 ][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $x if $x > $b[ $c ][ 0 ][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 ][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 ][ 1 ]; } return \@b; } sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C*', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; } sub buk2{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; for my $c ( unpack'C*', substr $$str, $y * $WIDTH, $WIDTH ) { $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } sub buk3{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $x = 0; while( substr( $$str, $y * $WIDTH, $WIDTH ) =~ m[(([^\0])+)]g +) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0] if $+[0] > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$x; } } return \@b; } my $pdl = chr(0); $pdl x= ( $WIDTH * $HEIGHT ); my( $x, $y ) = ( $WIDTH/2, $HEIGHT/2 ); for my $c ( 1 .. 255 ) { my $size = 3 + rand( 200 ); my $sizeDiv2 = int( ( $size+1 ) / 2 ); do{ ( $x, $y ) = ( $sizeDiv2 + rand( $WIDTH - $size - 1 ), $sizeDi +v2 + rand( $HEIGHT - $size - 1 ) ) } until substr( $pdl, $y * $WIDTH + $x, 1 ) eq chr( 0 ); redo unless makeObj( \$pdl, $x, $y, \$size, chr( $c ) ); } my $start = time; my $yr = yr \$pdl; my $end = time; printf "yr() took %.6f\n", $end - $start; $start = time; my $buk = buk \$pdl; $end = time; printf "buk() took %.6f\n", $end - $start; $start = time; my $buk2 = buk2 \$pdl; $end = time; printf "buk2() took %.6f\n", $end - $start; $start = time; my $buk3 = buk3 \$pdl; $end = time; printf "buk3() took %.6f\n", $end - $start; #<STDIN>; #pp $buk; pp $buk3;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks a lot for your time and interest. I had to rewrite the loop in "buk3" like this:
      for my $y ( 0 .. $HEIGHT-1 ) { my $s = substr( $$str, $y * $WIDTH, $WIDTH ); while( $s =~ m[(([^\0])+)]g ) { my $c = ord($1); #, $-[0], $+[0]; $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $+[0]-1 if $+[0]-1 > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; } }
      Otherwise it goes forever. We can't match globally against substr (as lvalue?), can we? When I run your code (5.24 on Windows), it says:
      yr() took 3.484375 buk() took 1.233143 buk2() took 0.670660 buk3() took 0.029448
      I.e. no hundreds of seconds, for my sub, at all. I would not otherwise publish it here and call it 'fast' :) Also, your test 'image' is something interesting, we can look at it if:
      PDL::IO::Image-> new_from_pdl( pdl([ unpack 'C*', $pdl ]) -> reshape( 1000, 1000 )-> bitnot )-> save( 'buk.png', 'PNG' );
      Not exactly representative as real life image. For typical real image it's this:
      PDL: Short D [7616,1200] max = 145 s/iter unpack buk2 regex buk3 unpack 3.02 -- -6% -36% -88% buk2 2.84 6% -- -32% -87% regex 1.92 57% 48% -- -81% buk3 0.362 733% 684% 430% --
      Anyway, your "buk3" algorithm is fastest.

        Curiouser and curiouser!

        When I run the code under 5.10, I get the sort of timings I posted above:

        C:\test>1176081 -WIDTH=1000 -HEIGHT=1000 yr() took 330.252300 buk() took 1.584859 buk2() took 1.277809 buk3() took 0.226701

        But if I run it under 5.22, your sub runs very much faster* and buk3() doesn't terminate at all:

        which is weird and indicates (IMO) a bug in the later versions

        (*I have an idea about the cause of the slowness in 5.10; I'll need to think of a way to verify it.)

        Modifying buk3() along the lines of your modification, but taking an lvalue ref outside the while loop and using it within the loop, allows it to work again:

        sub buk3{ my $str = shift; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; for my $y ( 0 .. $HEIGHT-1 ) { my $ref = \substr( $$str, $y * $WIDTH, $WIDTH ); while( $$ref =~ m[((.)\2*)]sg ) { my $c = ord($1); $b[ $c ][ LEFT ] = $-[0] if $-[0] < $b[ $c ][ LEFT + ]; $b[ $c ][ RIGHT ] = $+[0]-1 if $+[0]-1 > $b[ $c ][ RIGH +T ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP + ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTT +OM ]; } } return \@b; } C:\test>\perl22\bin\perl 1176081.pl -WIDTH=1000 -HEIGHT=1000 yr() took 2.438320 buk() took 1.068071 buk2() took 0.681810 buk3() took 0.160666

        Which is okay, but a strange difference.

        It kinda takes the steam out of my amazing speedup figures -- 15x instead of 7000x -- but the thrill is transitory anyway :)

        And now I can run your sub on a larger image, even that gain is far less:

        C:\test>\perl22\bin\perl 1176081.pl -WIDTH=10000 -HEIGHT=10000 yr() took 12.714030 buk() took 110.152981 buk2() took 70.100982 buk3() took 8.158657

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.

        It turns out that the reason for the pathologic behaviour of your code under 5.10 is entirely down to your use of $&, as described in the FAQ since circa 1999 or before.

        Replacing my $c = ord( $& ) - 1; with my $c = ord( substr $$str, $-[0], 1 ) - 1; entirely eliminates the performance problem on 5.10.1.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (1700xfaster)
by BrowserUk (Patriarch) on Nov 20, 2016 at 10:39 UTC

    You might try this. It expects a reference to the raw string, and the width & height available in globals -- adapt to your preferences:

    sub buk { my( $str ) = @_; my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256; #$s-> max; my( $i, $x, $y, $c ) = 0; for my $c ( unpack 'C*', $$str ) { $x = $i % $WIDTH; $y = int( $i / $WIDTH ); $b[ $c ][ LEFT ] = $x if $x < $b[ $c ][ LEFT ]; $b[ $c ][ RIGHT ] = $x if $x > $b[ $c ][ RIGHT ]; $b[ $c ][ TOP ] = $y if $y < $b[ $c ][ TOP ]; $b[ $c ][ BOTTOM ] = $y if $y > $b[ $c ][ BOTTOM ]; ++$i; } return \@b; }

    In my simulated tests, it performs 224 times faster than your posted code on a 1000x1000x1byte image and 1700 times faster on a 2000x2000:

    C:\test>1176081 -WIDTH=999 -HEIGHT=999 yr() took 329.728896 buk() took 1.474515 C:\test>1176081 -WIDTH=2000 -HEIGHT=2000 yr() took 10265.728694 buk() took 5.976180

    It is also trivially adaptable to images that use short/word/doubleword/quadword sized pixels.

    I have a couple of ideas for a faster version, which I'll post if they pan out.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1176127]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-24 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found