in reply to Re: Compile perl for performance
in thread Compile perl for performance

Marshall,

Thanks for your post. I agree with everything that you have written. I neglected to say that the code that I am testing has been around for a while. My first version of it took 32 hours to run. Through refining algorithms I was able to reduce it down to about 12 hours. I then ran it through Devel::NYTProf and based on a half a dozen or so iterations and algorithmic changes reduced the run time to about 8 hours.

At this point, I'm being greedy and seeing what else I can get. The vast majority of the time is spent in the LMDB driver writing to the database in this case. This accounts for about 80% of the run time. The next chunk is about 10% for Sereal to serialize the HashRef which is written to the db. The next chunk after that is about 5% to parse and analyze the input data into the HashRef in the previous chunk. The last 5% covers the reading of the input files and other miscellaneous.

My belief, based upon observing perl magic at a distance, is that between 5.28, usemyalloc and O3 that there is a net improvement on I/O, XS Integration and complier optimization that gets me to down to 6 hours.

If I just applied the last perl version and compiler optimization, I would only be down to 24 hours from 32. That vast majority of getting from 32 to 6 hours, all but 2 hours of the reduction, is due to algorithmic improvement.

I am somewhat concerned about the possibility of instability that you mentioned. In my experience, I have found O2 a reliable optimization level for gcc in general. I have run into problems with O3 where it helped on some code and actually made it worse on other code. One of the things that I love about App::perlbrew is that I can easily have multiple versions of perl installed. The version that I use every day is compiled with no additional flags. I do usually have one version available compiled with O2 for those programs where through testing I know that I receive a needed boost.

Thanks again for your advice!

lbe

Replies are listed 'Best First'.
Re^3: Compile perl for performance
by Marshall (Canon) on Aug 16, 2018 at 17:48 UTC
    Hi Ibe (learnedbyerror)!

    Thanks for your informative response!
    I think that we are "on the same page" and you know what you are doing!

    You are quite correct to be suspicious of gcc -O3 level. I TA an Advanced Assembly class whenever it "makes", which is only about every 5 years. It is a difficult class and it takes years to get enough qualified students in order to justify running the class.

    Sometimes we play "beat the compiler". This is possible at even at the highest optimization level. I agree that -O2 is "fairly safe". At -O3 the compiler gets increasingly bizarre in what it does - it writes ASM that no human would ever think of. It may even write code that winds up slower! The Phd guy I alluded to in an earlier post was one of our students.

    Here is one suggestion which may or may not help you:

    Your application is very DB intensive.
    The DB will have two important general limits:

    1. the number of operations per second
    2. a much smaller number, the number of transactions per second
    As it turns out, the commit of 1 million inserts doesn't take much longer than the commit of just one single insert.

    See if you can reduce the number of DB transactions per second. This can have a huge impact upon performance! You may or may not be done with the first part of optimization (algorithm and coding enhancements).

    I suspect that there is still more than a 25% improvement that can be had without resorting to an optimized compile of Perl itself.

      Marshall

      Thanks for your informative follow up. I picked up on the large transaction count optimization jointly from reading up on LMDB as well as using it previously with BerkeleyDB. I borrowed KENTNL's single_txn trick that he used in CHI::Driver::LMDB module. This is what took a lot of the time off of the original 32 hours. Most of the remainder improvement was due to setting LMDB ReadMode true and setting the environment flags to MDB_WRITEMAP | MDB_NOMETASYNC | MDB_MAPASYNC | MDB_NORDAHEAD

      I am currently committing after 100 000 inserts. I did some testing up to 1,000,000 and found insignificant improvements after 100,000. So I decided to keep 100,000 as my limit

      Thanks again for your sage advice!

      lbe

        Hi Ibe!

        This is what took a lot of the time off of the original 32 hours. Hooray for you! I would have expected your performance increase and your choice of 100 000 inserts per transaction sounds fine to me. Lots of folks are unaware that this can speed things up dramatically. Glad we covered that point because that info may help others.

        Since we appear to be down to compiler choice and compile options for Perl, I will add a few comments... These various compilers are definitely not "equivalent".

        I don't have any recent benchmarks, but when I got involved in the SETI@home project (many, many moons ago), there was a lot of focus on getting the work units to run faster. The Intel compiler emerged as a clear winner over MS or gcc. There are a whole mess of architecture specific flags which do make a significant difference. I am not sure, but there may be even "stepping" specific flags. That the Intel compiler is faster makes sense to me. Intel is in the business of selling processors and showing off their performance on various standard C benchmark programs is important to them.

        Some of Intel's instructions are hard for the compiler to use. For an example, if you want to compare if two buffers are equal or move a buffer from address X to address Y, that in theory a single machine instruction (ignoring byte alignment issues). The compiler has to realize that a C loop can be replaced with one machine instruction.

        In summary, if you really want to get max performance, I think you will need the Intel compiler but that costs $$. I don't know what your project budget is or even how much performance actually matters in an operational sense aside from the academic exercise of making it run faster just for the joy of doing that.