in reply to Why this code is so slow if run in thread?

Sorry for my earlier misdirection. By way of recompense I have what I believe (though it is essentially untested for lack of a suitable image), that addresses both the slowness of substr on utf strings within threads (which is just weird) and the problem I thought was the cause, that of cloning the returned array.

It avoids the former by doing away with the encoding, searching instead for runs of pairs of non-null characters in the unencoded pdl; and the latter by accumulating the counts in a packed binary array stored in a scalar.

sub test { my $fn = shift; my $img = PDL::IO::Image-> new_from_file( $fn ) or die "Failed to +load image"; my $pdl = $img->pixels_to_pdl->short; my $s = cc8compt( $pdl != 0 ); my $str = ${ $s-> get_dataref }; my ( $w, $h ) = $s-> dims; my $bounds = pack 'n4', $w, 0, $h, 0; $bounds x= $s->max; for my $y ( 0 .. $h - 1 ) { my $s = substr( $str, 2 * $y * $w, 2 * $w ); while( $s =~ m[(?:[^\0][^\0])+]g ) { my( $l, $r ) = ( $-[0]/2, (($+[0])-1)/2 ); my $c = ord( $& ); vec( $bounds, 4*$c+0, 16 ) = $l if $l < vec( $bounds, 4*$c ++0, 16 ); vec( $bounds, 4*$c+1, 16 ) = $r if $r > vec( $bounds, 4*$c ++1, 16 ); vec( $bounds, 4*$c+2, 16 ) = $y if $y < vec( $bounds, 4*$c ++2, 16 ); vec( $bounds, 4*$c+3, 16 ) = $y if $y > vec( $bounds, 4*$c ++3, 16 ); } } return $bounds; }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Why this code is so slow if run in thread?
by vr (Curate) on Dec 12, 2016 at 07:44 UTC

      Results of my workaround on your test image:

      C:\test>1177606 vrtest.png C:\test>1177606 vrtest.png No thread --------- Took:0.818306923 Count: 145 Thread --------- Took:2.834208012 Count: 145

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.

        You are right about "substr" being, unexpectedly, too slow with utf strings and threads. This program takes 12 seconds on my machine (I wanted some Greek letters, but it looks they are replaced with ugly codes. I think the idea is clear):

        use utf8; use threads; threads-> create( sub { $s = '&#945;&#946;&#947;&#948;' x 1000_000; substr( $s, 0, 1000 ) for 1 .. 1000; })-> join; print time - $^T;

        But then simple solution will be to, first, get a substring, and only then decode it. I.e. to move "decode" into loop. Then everything works as expected.