It was the following text that triggered me (again from the Wiki):
On a computer, one can often avoid explicitly transposing a matrix in memory by simply accessing the same data in a different order...
Because your reduction functions are simple (not like some Fourier transformation) I thought you might get away with this. You could also consider to change your data structure.
In reply to Re^5: An efficient, scalable matrix transformation algorithm
by dHarry
in thread An efficient, scalable matrix transformation algorithm
by Luftkissenboot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |