For PDL matrix operations, I don't think it's possible to get better performance than using PDL::LinearAlgebra, which wraps LAPACK (and the best-performing version of that is probably OpenBLAS).
Comment on Re^2: Supervised machine learning algo for text matching across two files