in reply to Re^3: Compile perl for performance
in thread Compile perl for performance

Marshall

Thanks for your informative follow up. I picked up on the large transaction count optimization jointly from reading up on LMDB as well as using it previously with BerkeleyDB. I borrowed KENTNL's single_txn trick that he used in CHI::Driver::LMDB module. This is what took a lot of the time off of the original 32 hours. Most of the remainder improvement was due to setting LMDB ReadMode true and setting the environment flags to MDB_WRITEMAP | MDB_NOMETASYNC | MDB_MAPASYNC | MDB_NORDAHEAD

I am currently committing after 100 000 inserts. I did some testing up to 1,000,000 and found insignificant improvements after 100,000. So I decided to keep 100,000 as my limit

Thanks again for your sage advice!

lbe

Replies are listed 'Best First'.
Re^5: Compile perl for performance
by Marshall (Canon) on Aug 19, 2018 at 18:39 UTC
    Hi Ibe!

    This is what took a lot of the time off of the original 32 hours. Hooray for you! I would have expected your performance increase and your choice of 100 000 inserts per transaction sounds fine to me. Lots of folks are unaware that this can speed things up dramatically. Glad we covered that point because that info may help others.

    Since we appear to be down to compiler choice and compile options for Perl, I will add a few comments... These various compilers are definitely not "equivalent".

    I don't have any recent benchmarks, but when I got involved in the SETI@home project (many, many moons ago), there was a lot of focus on getting the work units to run faster. The Intel compiler emerged as a clear winner over MS or gcc. There are a whole mess of architecture specific flags which do make a significant difference. I am not sure, but there may be even "stepping" specific flags. That the Intel compiler is faster makes sense to me. Intel is in the business of selling processors and showing off their performance on various standard C benchmark programs is important to them.

    Some of Intel's instructions are hard for the compiler to use. For an example, if you want to compare if two buffers are equal or move a buffer from address X to address Y, that in theory a single machine instruction (ignoring byte alignment issues). The compiler has to realize that a C loop can be replaced with one machine instruction.

    In summary, if you really want to get max performance, I think you will need the Intel compiler but that costs $$. I don't know what your project budget is or even how much performance actually matters in an operational sense aside from the academic exercise of making it run faster just for the joy of doing that.