in reply to Re^5: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
in thread XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off

I have implemented this on the current git master branch, and intend to release it very soon, after I have made a couple more tweaks to the demos system which has finally got overhauled. Thanks for the amazing research!
  • Comment on Re^6: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off

Replies are listed 'Best First'.
Re^7: XS module in ithreads Perl much slower in threads::join after adding SvOBJECT_off
by marioroy (Prior) on Mar 02, 2022 at 22:14 UTC

    Thank you, for the enlightenment on using PDL::LinearAlgebra::Real. I updated the examples.

    Passing a flag to the script will attempt to load PDL::LinearAlgebra::Real.
    If available, PDL::LinearAlgebra::Real computes faster via LAPACK/OpenBLAS.
    Use PDL 2.077 or later for best results. Check also, OpenMP-enabled i.e.
    pkg-config --variable=openblas_config openblas | grep -c USE_OPENMP
    
    perl matmult_base.pl  4096        # 54.685s built-in matrix multiply
    perl matmult_base.pl  4096 1      #  6.706s LAPACK/OpenBLAS 1 thread
    perl matmult_base.pl  4096 4      #  1.727s LAPACK/OpenBLAS 4 threads
    
    perl matmult_mce_d.pl 4096 4      # 12.468s built-in matrix multiply
    perl matmult_mce_d.pl 4096 4 1    #  1.915s LAPACK/OpenBLAS 4 threads
    
    perl matmult_mce_f.pl 4096 4      # 11.950s built-in matrix multiply
    perl matmult_mce_f.pl 4096 4 1    #  1.836s LAPACK/OpenBLAS 4 threads
    
    perl matmult_mce_t.pl 4096 4      # 12.245s built-in matrix multiply
    perl matmult_mce_t.pl 4096 4 1    #  1.856s LAPACK/OpenBLAS 4 threads
    
    perl matmult_simd.pl  4096 4      # 16.136s built-in matrix multiply
    perl matmult_simd.pl  4096 4 1    #  1.763s LAPACK/OpenBLAS 4 threads
    
    perl strassen_07_f.pl 4096        #  3.516s built-in matrix multiply
    perl strassen_07_f.pl 4096 1      #  1.915s LAPACK/OpenBLAS 7 threads
    
    perl strassen_07_t.pl 4096        #  3.658s built-in matrix multiply
    perl strassen_07_t.pl 4096 1      #  2.072s LAPACK/OpenBLAS 7 threads
    

    Look at matmult_base.pl go :) This is possible with OpenMP-enabled LAPACK/OpenBLAS libs.