in reply to Fast matrix multiplication

If you have a sparse matrix (mostly zeros) there are algorithms out there that will speed things up considerably. Google "sparse matrix multiplication". If not, you'll probably have to hook in some compiled code (as you have done) - to get a little extra speed you might try MMX extensions.
Update:
Here's the basic idea behind sparse matrix multiplication. Collapse the vectors into lists of the non-zero elements and corresponding lists of their indexes in the original vectors. Then traverse the lists of indexes for the vectors you are multiplying - you only have to multiply the vector elements when their indexes match up. Your result is the intersection of the lists of indexes and the multiplied elements of those indexes. (Obviously, this only helps if most of the elements are zeros - however, that's fairly common in the sciences.)