 Syntactic Confectionery Delight PerlMonks

### Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster)

by BrowserUk (Patriarch)
 on Nov 20, 2016 at 13:57 UTC ( #1176204=note: print w/replies, xml ) Need Help??

buk2() below is about 50% faster than buk(), but the best is buk3() which 36x faster than buk() and a cool 7000 times faster than yr(). (That astounded me, and I didn't believe it at first, but it's true!):

```C:\test>1176081 -WIDTH=1000 -HEIGHT=1000
yr() took 295.895124
buk() took 1.483303
buk2() took 1.113015
buk3() took 0.042507

And a run on a 10kx10k "image" (without yr() as it would take days.):

```C:\test>1176081 -WIDTH=10000 -HEIGHT=10000
yr() took 0.000003
buk() took 150.481585
buk2() took 102.911382
buk3() took 3.710700

The test is only a crude simulation, so things may not pan out quite so well with the real data, but it worth a look :)

My test harness:

```#! perl -slw
use strict;
use Time::HiRes qw[ time ];
use Data::Dump qw[ pp ];
use constant { LEFT=>0, RIGHT=>1, TOP=>2, BOTTOM=>3 };

our \$WIDTH  //= 1000;
our \$HEIGHT //= 1000;

sub makeObj{
my( \$img, \$x, \$y, \$size, \$c ) = @_;
for my \$y1 ( \$y - ( \$\$size / 2 ) .. \$y + ( \$\$size / 2 ) ) {
return () unless substr( \$\$img, \$y1*\$WIDTH + \$x-((\$\$size+1)/2)
+, \$\$size ) = chr(0)x(\$\$size);
}
for my \$y1 ( \$y - ( \$\$size / 2 ) .. \$y + ( \$\$size / 2 ) ) {
substr( \$\$img, \$y1 * \$WIDTH + \$x-((\$\$size+1)/2), \$\$size+2 ) =
+\$c x(\$\$size+2);
}
return 1;
}

sub yr {
#    use integer; ## using int() below seemed faster than this.
my \$str = shift;

my @b = map { [ [ \$WIDTH, 0 ], [ \$HEIGHT, 0 ] ] } 1 .. 256; #\$s->
+max;

while ( \$\$str =~ /[^\x00]/g ) {
my \$i = pos( \$\$str ) - 1;
my \$x = \$i % \$WIDTH;
my \$y = int( \$i / \$WIDTH );
my \$c = ord( \$& ) - 1;
\$b[ \$c ][ 0 ][ 0 ] = \$x if \$x < \$b[ \$c ][ 0 ][ 0 ];
\$b[ \$c ][ 0 ][ 1 ] = \$x if \$x > \$b[ \$c ][ 0 ][ 1 ];
\$b[ \$c ][ 1 ][ 0 ] = \$y if \$y < \$b[ \$c ][ 1 ][ 0 ];
\$b[ \$c ][ 1 ][ 1 ] = \$y if \$y > \$b[ \$c ][ 1 ][ 1 ];
}
return \@b;
}

sub buk {
my( \$str ) = @_;

my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256;

my( \$i, \$x, \$y, \$c ) = 0;
for my \$c ( unpack 'C*', \$\$str ) {
\$x = \$i % \$WIDTH;
\$y = int( \$i / \$WIDTH );
\$b[ \$c ][ LEFT   ] = \$x if \$x < \$b[ \$c ][ LEFT   ];
\$b[ \$c ][ RIGHT  ] = \$x if \$x > \$b[ \$c ][ RIGHT  ];
\$b[ \$c ][ TOP    ] = \$y if \$y < \$b[ \$c ][ TOP    ];
\$b[ \$c ][ BOTTOM ] = \$y if \$y > \$b[ \$c ][ BOTTOM ];
++\$i;
}
return \@b;
}

sub buk2{
my \$str = shift;
my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256;

for my \$y ( 0 .. \$HEIGHT-1 ) {
my \$x = 0;
for my \$c ( unpack'C*', substr \$\$str, \$y * \$WIDTH, \$WIDTH ) {
\$b[ \$c ][ LEFT   ] = \$x if \$x < \$b[ \$c ][ LEFT   ];
\$b[ \$c ][ RIGHT  ] = \$x if \$x > \$b[ \$c ][ RIGHT  ];
\$b[ \$c ][ TOP    ] = \$y if \$y < \$b[ \$c ][ TOP    ];
\$b[ \$c ][ BOTTOM ] = \$y if \$y > \$b[ \$c ][ BOTTOM ];
++\$x;
}
}
return \@b;
}

sub buk3{
my \$str = shift;
my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256;

for my \$y ( 0 .. \$HEIGHT-1 ) {
my \$x = 0;
while( substr( \$\$str, \$y * \$WIDTH, \$WIDTH ) =~ m[(([^\0])+)]g
+) {
my \$c = ord(\$1); #, \$-, \$+;
\$b[ \$c ][ LEFT   ] = \$- if \$- < \$b[ \$c ][ LEFT   ];
\$b[ \$c ][ RIGHT  ] = \$+ if \$+ > \$b[ \$c ][ RIGHT  ];
\$b[ \$c ][ TOP    ] = \$y    if    \$y < \$b[ \$c ][ TOP    ];
\$b[ \$c ][ BOTTOM ] = \$y    if    \$y > \$b[ \$c ][ BOTTOM ];
++\$x;
}
}
return \@b;
}

my \$pdl = chr(0); \$pdl x= ( \$WIDTH * \$HEIGHT );

my( \$x, \$y ) =  ( \$WIDTH/2, \$HEIGHT/2 );
for my \$c ( 1 .. 255 ) {
my \$size = 3 + rand( 200 );
my \$sizeDiv2 = int( ( \$size+1 ) / 2 );
do{
( \$x, \$y ) = ( \$sizeDiv2 + rand( \$WIDTH - \$size - 1 ), \$sizeDi
+v2 + rand( \$HEIGHT - \$size - 1 ) )
} until substr( \$pdl, \$y * \$WIDTH + \$x, 1 ) eq chr( 0 );
redo unless makeObj( \\$pdl, \$x, \$y, \\$size, chr( \$c ) );
}

my \$start = time;
my \$yr = yr \\$pdl;
my \$end = time;

printf "yr() took %.6f\n", \$end - \$start;

\$start = time;
my \$buk = buk \\$pdl;
\$end = time;

printf "buk() took %.6f\n", \$end - \$start;

\$start = time;
my \$buk2 = buk2 \\$pdl;
\$end = time;

printf "buk2() took %.6f\n", \$end - \$start;

\$start = time;
my \$buk3 = buk3 \\$pdl;
\$end = time;

printf "buk3() took %.6f\n", \$end - \$start;
#<STDIN>;
#pp \$buk; pp \$buk3;

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.
• Comment on Re^3: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster)

Replies are listed 'Best First'.
Re^4: PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects" (7000x faster)
by vr (Curate) on Nov 21, 2016 at 12:44 UTC
Thanks a lot for your time and interest. I had to rewrite the loop in "buk3" like this:
```for my \$y ( 0 .. \$HEIGHT-1 ) {
my \$s = substr( \$\$str, \$y * \$WIDTH, \$WIDTH );
while( \$s =~ m[(([^\0])+)]g ) {
my \$c = ord(\$1); #, \$-, \$+;
\$b[ \$c ][ LEFT   ] = \$-   if \$-   < \$b[ \$c ][ LEFT   ];
\$b[ \$c ][ RIGHT  ] = \$+-1 if \$+-1 > \$b[ \$c ][ RIGHT  ];
\$b[ \$c ][ TOP    ] = \$y      if      \$y < \$b[ \$c ][ TOP    ];
\$b[ \$c ][ BOTTOM ] = \$y      if      \$y > \$b[ \$c ][ BOTTOM ];
}
}
Otherwise it goes forever. We can't match globally against substr (as lvalue?), can we? When I run your code (5.24 on Windows), it says:
```yr() took 3.484375
buk() took 1.233143
buk2() took 0.670660
buk3() took 0.029448
I.e. no hundreds of seconds, for my sub, at all. I would not otherwise publish it here and call it 'fast' :) Also, your test 'image' is something interesting, we can look at it if:
```PDL::IO::Image-> new_from_pdl(
pdl([ unpack 'C*', \$pdl ])
-> reshape( 1000, 1000 )-> bitnot
)-> save( 'buk.png', 'PNG' );
Not exactly representative as real life image. For typical real image it's this:
```PDL: Short D [7616,1200]
max = 145

s/iter unpack   buk2  regex   buk3
unpack   3.02     --    -6%   -36%   -88%
buk2     2.84     6%     --   -32%   -87%
regex    1.92    57%    48%     --   -81%
buk3    0.362   733%   684%   430%     --
Anyway, your "buk3" algorithm is fastest.

Curiouser and curiouser!

When I run the code under 5.10, I get the sort of timings I posted above:

```C:\test>1176081 -WIDTH=1000 -HEIGHT=1000
yr() took 330.252300
buk() took 1.584859
buk2() took 1.277809
buk3() took 0.226701

But if I run it under 5.22, your sub runs very much faster* and buk3() doesn't terminate at all:

which is weird and indicates (IMO) a bug in the later versions

(*I have an idea about the cause of the slowness in 5.10; I'll need to think of a way to verify it.)

Modifying buk3() along the lines of your modification, but taking an lvalue ref outside the while loop and using it within the loop, allows it to work again:

```sub buk3{
my \$str = shift;
my @b = map[ ( 1e99, 0 ) x 2 ], 1 .. 256;

for my \$y ( 0 .. \$HEIGHT-1 ) {
my \$ref = \substr( \$\$str, \$y * \$WIDTH, \$WIDTH );
while( \$\$ref  =~ m[((.)\2*)]sg ) {
my \$c = ord(\$1);
\$b[ \$c ][ LEFT   ] = \$-   if  \$-    < \$b[ \$c ][ LEFT
+   ];
\$b[ \$c ][ RIGHT  ] = \$+-1 if  \$+-1  > \$b[ \$c ][ RIGH
+T  ];
\$b[ \$c ][ TOP    ] = \$y      if        \$y < \$b[ \$c ][ TOP
+   ];
\$b[ \$c ][ BOTTOM ] = \$y      if        \$y > \$b[ \$c ][ BOTT
+OM ];
}
}
return \@b;
}

C:\test>\perl22\bin\perl 1176081.pl -WIDTH=1000 -HEIGHT=1000
yr() took 2.438320
buk() took 1.068071
buk2() took 0.681810
buk3() took 0.160666

Which is okay, but a strange difference.

It kinda takes the steam out of my amazing speedup figures -- 15x instead of 7000x -- but the thrill is transitory anyway :)

And now I can run your sub on a larger image, even that gain is far less:

```C:\test>\perl22\bin\perl 1176081.pl -WIDTH=10000 -HEIGHT=10000
yr() took 12.714030
buk() took 110.152981
buk2() took 70.100982
buk3() took 8.158657

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.
(*I have an idea about the cause of the slowness in 5.10; I'll need to think of a way to verify it.)
I came across this article a little while back. It might help explain the speed difference. I certain have seen a difference running a Perl application between 5.1x and 5.22.

How We Spent Two Days Making Perl Faster

Hope you find it interesting!

It turns out that the reason for the pathologic behaviour of your code under 5.10 is entirely down to your use of \$&, as described in the FAQ since circa 1999 or before.

Replacing my \$c = ord( \$& ) - 1; with my \$c = ord( substr \$\$str, \$-, 1 ) - 1; entirely eliminates the performance problem on 5.10.1.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.
Thanks. I can allow myself :) to run all my code under new versions, and disregard performance issues of \$&. Btw, using \$& instead of creating capture groups and \$1 gives a slight boost to "buk3". Also, I earlier experimented with "yr" - using "pos" (as it is) also gives quite a boost against using \$- or \$+.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1176204]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2023-05-30 18:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?