Greetings, Monks. I hope you all had a very pleasant Kwanzaa. I am seeking your advice on the most (or at least, a very) efficient method to perform the following transformation:
Given a matrix of rank 2, perform individual consolidation functions on each column, resulting in one "row" of data, with each field having been transformed by its corresponding function.
F1 F2 F3 F4 F5 F6 : a1 b1 c1 d1 e1 g1 -> A1 B1 C1 D1 E1 G1 a2 b2 c2 d2 e2 g2 a3 b3 c3 d3 e3 g3 a4 b4 c4 d4 e4 g4 a5 b5 c5 d5 e5 g5 a6 b6 c6 d6 e6 g6 a7 b7 c7 d7 e7 g7 a8 b8 c8 d8 e8 g8
Parameters: The source data always has the same number of columns, and the same reduction functions are always applied to the columns. The data is supplied in "chunks", at which point the consolidation functions need to be applied to that "chunk" of data and output one "record." Examples of consolidation functions are min(), max(), avg(), sum(), etc...
(This is not homework, it's just a really potentially computationally complex problem, that has nearly an infinite number of Bad™ solutions, and I'd like to do the Right Thing™ and perhaps learn and share something new in the process).
Update: Code sample in this reply: Re^2: An efficient, scalable matrix transformation algorithm
My current, already-implemented and workable solution follows roughly the following steps:
It works "fair-to-middling", but the matrix transposition step itself adds a significant overhead (I used a decent algorithm, though admittedly, it's in PurePerl, as I couldn't find an XS implementation).
The issue is that this is an O(n*m) solution, and as it should be well-covered/considered territory, as I cannot be the first person to ever have to perform this type of transformation in the history of computing/mathematics/stream/signal processing, etc., I can't help thinking that there must be a more efficient solution.
NOTE: Technically, the transposition step, while adding significant overhead, may actually result in an aggregate time reduction, as some of the reduction functions do not operate over the full set of supplied data, and may return immediately or after partial processing. So, it ends up being not quite as bad as a full O(n*m). If it were not transposed, this would not be the case.
I thus humbly throw myself upon the mercy of the monks to discuss possible alternative solutions to this problem. TBH, it's the algorithm that I am mostly interested in, as a brute-force solution will always do the speedup, but within the algorithm lies the beauty.
------
FYI, here is a list of some other things I have considered, tried, or looked into:
(You can see from these that that I am more an engineer than a mathematician, but I am not stupid and willing to learn.)
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |