Ok so I did a lot of tests, placing debug timers in various xs and Perl code places and using NYTProf. I think there are two problems:

  1. the Perl code which is executed every time the IO::Socket::SSL::sysread() function is called is not optimized well enough to be executed tens of thousands of times per second or more (mainly when non-blocking mode is used)
  2. there are too many SSLeay functions which need to be called from Perl code every time an unproductive read is performed (Net::SSLeay::read() + loop calling both Net::SSLeay::get_error() and Net::SSLeay::ERR_clear_error())

I managed to workaround the first problem by re-implementing the SSL read logic directly in my Perl code, without calling any IO::Socket::SSL function, just by calling the SSLeay xs functions directly. This allowed me to halve the average execution time of the unproductive reads. Performances still aren't satisfying though, due to the second problem.

I guess the only way to address the second problem would be to re-implement the IO::Socket::SSL::_generic_read() function (or maybe directly the IO::Socket::SSL::sysread() function ?) in xs instead of Perl. It seems it's not possible to call that many external functions from Perl in a loop being executed tens of thousands times per second, without the overhead starting to show and impacting the performances. If the main SSL read function, handling both the read and the error management (updating IO::Socket::SSL internals), was the only one which had to be called from Perl, I think we could get acceptable performance.

Some profiling data before and after optimization: