Interesting. FFT for this task, who would have thought a few days ago. Looks like it runs in almost constant time, regardless of kernel size; and of course slower than special solutions. I had to add a couple of checks to accommodate for (unimportant) corner test case of submatrix with dims (4,3) and example matrix.
sub sms_WxH_PDL_fftconvolve ( $m, $w, $h ) {
my ( $W, $H ) = $m-> dims;
my $small = ones( $w%2?$w:$w+1, $h%2?$h:$h+1 );
$small-> slice( -1 ) .= 0 unless $w%2; # zero row and/or column
+ for even kernels
$small-> slice( [],-1 ) .= 0 unless $h%2;
my $result = $m-> copy;
$result = append( $result, 0 ) if $W == $w and not $W % 2;
$result = append( $result-> transpose, 0 )-> transpose
if $H == $h and not $H % 2;
my $kernel = kernctr( $result, $small ); #full kernel
$result-> fftconvolve( $kernel );
$result = $result-> slice( '0:-2','' ) if $W == $w and not $W % 2;
$result = $result-> slice( '','0:-2' ) if $H == $h and not $H % 2;
return $result-> slice(
[floor(($w-1)/2),floor(-($w+1)/2)],
[floor(($h-1)/2),floor(-($h+1)/2)]
)
}
__END__
Time (s) vs. N (NxN submatrix, PDL: Double D [1500,1500] matrix)
+-----------------------------------------------------------+
| + + + + + + + + |
| A |
| |
| |
2 |-+ +-|
| B B B B B |
| BB B B B B B B |
| B B B B B |
| C C |
| C C |
1.5 |-+ A C +-|
| C |
| C |
| |
| C |
1 |-+ C +-|
| A |
| C C |
| |
| |
| C |
0.5 |-+ A +-|
| C |
| C |
| A |
| D D |
| CC D D D D D D D D D D D D |
0 |-+ D +-|
| + + + + + + + + |
+-----------------------------------------------------------+
0 200 400 600 800 1000 1200 1400
sms_WxH_PDL_conv2d A
sms_WxH_PDL_fftconvolve B
sms_WxH_PDL_lags C
sms_WxH_PDL_sliding D
+------+-------+-------+-------+-------+
| N | A | B | C | D |
+------+-------+-------+-------+-------+
| 2 | 0.023 | 1.792 | 0.021 | |
| 4 | 0.065 | 1.854 | 0.036 | |
| 8 | 0.237 | 1.896 | 0.031 | |
| 12 | 0.534 | 1.865 | 0.049 | |
| 16 | 0.930 | 1.740 | 0.039 | |
| 20 | 1.432 | 1.812 | 0.057 | |
| 25 | 2.224 | 1.716 | 0.081 | 0.099 |
| 100 | | 1.899 | 0.380 | 0.107 |
| 200 | | 1.779 | 0.826 | 0.075 |
| 300 | | 1.836 | 1.143 | 0.089 |
| 400 | | 1.760 | 1.393 | 0.065 |
| 500 | | 1.826 | 1.560 | 0.047 |
| 600 | | 1.740 | 1.659 | 0.036 |
| 700 | | 1.818 | 1.620 | 0.049 |
| 800 | | 1.919 | 1.589 | 0.031 |
| 900 | | 1.917 | 1.484 | 0.026 |
| 1000 | | 1.789 | 1.260 | 0.008 |
| 1100 | | 1.880 | 1.068 | 0.010 |
| 1200 | | 1.745 | 0.823 | 0.010 |
| 1300 | | 1.841 | 0.562 | 0.008 |
| 1400 | | 1.841 | 0.273 | 0.003 |
+------+-------+-------+-------+-------+
|