in reply to Re^2: Compile perl for performance
in thread Compile perl for performance

Hi Ibe (learnedbyerror)!

Thanks for your informative response!
I think that we are "on the same page" and you know what you are doing!

You are quite correct to be suspicious of gcc -O3 level. I TA an Advanced Assembly class whenever it "makes", which is only about every 5 years. It is a difficult class and it takes years to get enough qualified students in order to justify running the class.

Sometimes we play "beat the compiler". This is possible at even at the highest optimization level. I agree that -O2 is "fairly safe". At -O3 the compiler gets increasingly bizarre in what it does - it writes ASM that no human would ever think of. It may even write code that winds up slower! The Phd guy I alluded to in an earlier post was one of our students.

Here is one suggestion which may or may not help you:

Your application is very DB intensive.
The DB will have two important general limits:

  1. the number of operations per second
  2. a much smaller number, the number of transactions per second
As it turns out, the commit of 1 million inserts doesn't take much longer than the commit of just one single insert.

See if you can reduce the number of DB transactions per second. This can have a huge impact upon performance! You may or may not be done with the first part of optimization (algorithm and coding enhancements).

I suspect that there is still more than a 25% improvement that can be had without resorting to an optimized compile of Perl itself.

Replies are listed 'Best First'.
Re^4: Compile perl for performance
by learnedbyerror (Monk) on Aug 17, 2018 at 04:26 UTC

    Marshall

    Thanks for your informative follow up. I picked up on the large transaction count optimization jointly from reading up on LMDB as well as using it previously with BerkeleyDB. I borrowed KENTNL's single_txn trick that he used in CHI::Driver::LMDB module. This is what took a lot of the time off of the original 32 hours. Most of the remainder improvement was due to setting LMDB ReadMode true and setting the environment flags to MDB_WRITEMAP | MDB_NOMETASYNC | MDB_MAPASYNC | MDB_NORDAHEAD

    I am currently committing after 100 000 inserts. I did some testing up to 1,000,000 and found insignificant improvements after 100,000. So I decided to keep 100,000 as my limit

    Thanks again for your sage advice!

    lbe

      Hi Ibe!

      This is what took a lot of the time off of the original 32 hours. Hooray for you! I would have expected your performance increase and your choice of 100 000 inserts per transaction sounds fine to me. Lots of folks are unaware that this can speed things up dramatically. Glad we covered that point because that info may help others.

      Since we appear to be down to compiler choice and compile options for Perl, I will add a few comments... These various compilers are definitely not "equivalent".

      I don't have any recent benchmarks, but when I got involved in the SETI@home project (many, many moons ago), there was a lot of focus on getting the work units to run faster. The Intel compiler emerged as a clear winner over MS or gcc. There are a whole mess of architecture specific flags which do make a significant difference. I am not sure, but there may be even "stepping" specific flags. That the Intel compiler is faster makes sense to me. Intel is in the business of selling processors and showing off their performance on various standard C benchmark programs is important to them.

      Some of Intel's instructions are hard for the compiler to use. For an example, if you want to compare if two buffers are equal or move a buffer from address X to address Y, that in theory a single machine instruction (ignoring byte alignment issues). The compiler has to realize that a C loop can be replaced with one machine instruction.

      In summary, if you really want to get max performance, I think you will need the Intel compiler but that costs $$. I don't know what your project budget is or even how much performance actually matters in an operational sense aside from the academic exercise of making it run faster just for the joy of doing that.