in reply to Re^6: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
in thread XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off

Thank you, for the enlightenment on using PDL::LinearAlgebra::Real. I updated the examples.

Passing a flag to the script will attempt to load PDL::LinearAlgebra::Real.
If available, PDL::LinearAlgebra::Real computes faster via LAPACK/OpenBLAS.
Use PDL 2.077 or later for best results. Check also, OpenMP-enabled i.e.
pkg-config --variable=openblas_config openblas | grep -c USE_OPENMP
perl matmult_base.pl  4096        # 54.685s built-in matrix multiply
perl matmult_base.pl  4096 1      #  6.706s LAPACK/OpenBLAS 1 thread
perl matmult_base.pl  4096 4      #  1.727s LAPACK/OpenBLAS 4 threads

perl matmult_mce_d.pl 4096 4      # 12.468s built-in matrix multiply
perl matmult_mce_d.pl 4096 4 1    #  1.915s LAPACK/OpenBLAS 4 threads

perl matmult_mce_f.pl 4096 4      # 11.950s built-in matrix multiply
perl matmult_mce_f.pl 4096 4 1    #  1.836s LAPACK/OpenBLAS 4 threads

perl matmult_mce_t.pl 4096 4      # 12.245s built-in matrix multiply
perl matmult_mce_t.pl 4096 4 1    #  1.856s LAPACK/OpenBLAS 4 threads

perl matmult_simd.pl  4096 4      # 16.136s built-in matrix multiply
perl matmult_simd.pl  4096 4 1    #  1.763s LAPACK/OpenBLAS 4 threads

perl strassen_07_f.pl 4096        #  3.516s built-in matrix multiply
perl strassen_07_f.pl 4096 1      #  1.915s LAPACK/OpenBLAS 7 threads

perl strassen_07_t.pl 4096        #  3.658s built-in matrix multiply
perl strassen_07_t.pl 4096 1      #  2.072s LAPACK/OpenBLAS 7 threads

Look at matmult_base.pl go :) This is possible with OpenMP-enabled LAPACK/OpenBLAS libs.

  • Comment on Re^7: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off