Thank you, for the enlightenment on using PDL::LinearAlgebra::Real. I updated the examples.
Passing a flag to the script will attempt to load PDL::LinearAlgebra::Real. If available, PDL::LinearAlgebra::Real computes faster via LAPACK/OpenBLAS. Use PDL 2.077 or later for best results. Check also, OpenMP-enabled i.e. pkg-config --variable=openblas_config openblas | grep -c USE_OPENMP
perl matmult_base.pl 4096 # 54.685s built-in matrix multiply perl matmult_base.pl 4096 1 # 6.706s LAPACK/OpenBLAS 1 thread perl matmult_base.pl 4096 4 # 1.727s LAPACK/OpenBLAS 4 threads perl matmult_mce_d.pl 4096 4 # 12.468s built-in matrix multiply perl matmult_mce_d.pl 4096 4 1 # 1.915s LAPACK/OpenBLAS 4 threads perl matmult_mce_f.pl 4096 4 # 11.950s built-in matrix multiply perl matmult_mce_f.pl 4096 4 1 # 1.836s LAPACK/OpenBLAS 4 threads perl matmult_mce_t.pl 4096 4 # 12.245s built-in matrix multiply perl matmult_mce_t.pl 4096 4 1 # 1.856s LAPACK/OpenBLAS 4 threads perl matmult_simd.pl 4096 4 # 16.136s built-in matrix multiply perl matmult_simd.pl 4096 4 1 # 1.763s LAPACK/OpenBLAS 4 threads perl strassen_07_f.pl 4096 # 3.516s built-in matrix multiply perl strassen_07_f.pl 4096 1 # 1.915s LAPACK/OpenBLAS 7 threads perl strassen_07_t.pl 4096 # 3.658s built-in matrix multiply perl strassen_07_t.pl 4096 1 # 2.072s LAPACK/OpenBLAS 7 threads
Look at matmult_base.pl go :) This is possible with OpenMP-enabled LAPACK/OpenBLAS libs.
In reply to Re^7: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
by marioroy
in thread XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
by etj
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |