in reply to Fast sliding submatrix sums with PDL (inspired by PWC 248 task 2)
The referenced challenge was inspired by the example given in PDL::Broadcasting, especially by the usage of range. Accordingly my own solution looks similar to the example:
use strict; use warinings; use PDL; use PDL::NiceSlice; sub submatrix_sum { my $m = pdl @_; $m->range(ndcoords($m(1:,1:)), 2)->reorder(2, 3, 0, 1) ->clump(2)->sumover; }
It can easily be extended to WxH submatrices:
sub submatrix_sum { my $w = shift; my $h = shift; my $m = pdl @_; $m->range(ndcoords($m($w - 1:, $h - 1:), [$w, $h])->reorder(2, 3, +0, 1) ->clump(2)->sumover; }
Greetings,
-jo
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Fast sliding submatrix sums with PDL (inspired by PWC 248 task 2)
by Anonymous Monk on Dec 31, 2023 at 16:07 UTC | |
Thanks for great challenge, first of all (you were our taskmaster, right?) I didn't mention your solution, because it was kind of slow for comparisons. But now I see where it and challenge itself have originated. PDL is very TIMTOWTDI-ish, and range is ultimately important tool, to extract/address rectangular areas from ndarrays of any dimensions. But, eh-hm, don't you see that the POD you linked sings praises to wonders of broadcasting, which (i.e. broadcasting) you simply discarded? Broadcasting really only happens in this fragment:
which you replaced with
(Frankly, it's obvious I think that huge speed difference of Game-of-Life implementations in the linked tutorial is due to the way looping was generally performed rather than this "broadcasting" only, -- but perhaps that's how tutorials work.) Consider:
(I took liberty to use couple of numbers as args to ndcoords instead of matrix/slice, which only serves as source of these 2 numbers). Note, the matrix is now smaller than in previous tests. Both A and B versions are very much slower than the so far slowest "naive" variant. Though ndcoords builds a relatively large ndarray to feed to range, I think range is simply not written with speed/performance as its goal. It's actually tempting to try to improve Game of Life PDL implementation from the tutorial:
Sorry about crude profiling/tests; and improvement is somewhat far from what I expected. Even with "core time" singled out -- because next gen calculation is not very efficient (e.g. $n == 3 array is built twice), but that's another story -- which is "only" 8x better. Maybe all this glueing/appending to maintain constant matrix size and "wrap around" at edges takes its toll. | [reply] [d/l] [select] |
by jo37 (Curate) on Dec 31, 2023 at 17:25 UTC | |
Hi Steve (assuming it's you), I haven't spent any effort on optimizing the solution. Thank you for running the tests! It is interesting to see that ->clump(2)->sumover doesn't perform as good as ->sumover->sumover. Actually I didn't look up the example in PDL::Broadcasting but wrote my solution from memory. Looks like I'm getting older :-) Greetings, | [reply] [d/l] [select] |
by Anonymous Monk on Jan 02, 2024 at 12:24 UTC | |
Here is last (hopefully) instalment in this thread; it only concerns faster, compared to the PDL CPAN tutorial, "Game of Life" implementation. But maybe some features used are too advanced for a tutorial. Improvement comes from:
| [reply] [d/l] |
by Anonymous Monk on Jan 03, 2024 at 12:02 UTC | |
Oops, sorry, one more: subtraction is not required of course, and, more importantly, next gen calculation expression can be split in 2 lines -- same 4 ops but presumably done in-place i.e. faster:
| [reply] [d/l] |