Thank you, for the enlightenment on using PDL::LinearAlgebra::Real. I updated the examples.
Passing a flag to the script will attempt to load PDL::LinearAlgebra::Real.
If available, PDL::LinearAlgebra::Real computes faster via LAPACK/OpenBLAS.
Use PDL 2.077 or later for best results. Check also, OpenMP-enabled i.e.
pkg-config --variable=openblas_config openblas | grep -c USE_OPENMP
perl matmult_base.pl 4096 # 54.685s built-in matrix multiply
perl matmult_base.pl 4096 1 # 6.706s LAPACK/OpenBLAS 1 thread
perl matmult_base.pl 4096 4 # 1.727s LAPACK/OpenBLAS 4 threads
perl matmult_mce_d.pl 4096 4 # 12.468s built-in matrix multiply
perl matmult_mce_d.pl 4096 4 1 # 1.915s LAPACK/OpenBLAS 4 threads
perl matmult_mce_f.pl 4096 4 # 11.950s built-in matrix multiply
perl matmult_mce_f.pl 4096 4 1 # 1.836s LAPACK/OpenBLAS 4 threads
perl matmult_mce_t.pl 4096 4 # 12.245s built-in matrix multiply
perl matmult_mce_t.pl 4096 4 1 # 1.856s LAPACK/OpenBLAS 4 threads
perl matmult_simd.pl 4096 4 # 16.136s built-in matrix multiply
perl matmult_simd.pl 4096 4 1 # 1.763s LAPACK/OpenBLAS 4 threads
perl strassen_07_f.pl 4096 # 3.516s built-in matrix multiply
perl strassen_07_f.pl 4096 1 # 1.915s LAPACK/OpenBLAS 7 threads
perl strassen_07_t.pl 4096 # 3.658s built-in matrix multiply
perl strassen_07_t.pl 4096 1 # 2.072s LAPACK/OpenBLAS 7 threads
Look at matmult_base.pl go :) This is possible with OpenMP-enabled LAPACK/OpenBLAS libs. |