Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?

by Anonymous Monk
on Dec 21, 2016 at 16:30 UTC ( [id://1178305]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

For the last couple of days I have been struggling to identity the reason for a drop in performance when I've tried to upgrade from 32bit 5.8.9 to 64bit 5.24.0 on Windows. The drop in performance is around the 30% mark, but in some tests it is as high as 80%. This is measured using our own performance test suite. I suspect I am missing something simple, so any suggestions would be helpful...

The application in question is a mix of perl/xs/c on windows, but the backend can and does run on Linux (with the same test and performance suite). The rational of going to 64bits is that users are running out of memory when running large jobs. The jobs can take time - hours in some cases, so any drop in performance is a major issue. The application passes it's comprehensive test suite on all versions of perl that I have tested it with. The compile flags are the same (-O2) in all cases. Versions of GCC are, however, different. Testing 64 bit v 32 bit build with the same version of perl/gcc always has the 64bit version being faster. An overview:

  • 32bit Activestate 5.8.9, mingw 4.4.1 (our current version) is about 30% faster than 64bit Strawberry 5.24.0, mingw 4.9.2
  • 64bit Strawberry 5.24.0, mingw 4.9.2 is about 15% faster than 32bit Strawberry 5.24.0, mingw 4.9.2
  • 64bit Linux 5.24.0, gcc 4.8.1 is about 10% faster than 64bit Linux 5.8.9, gcc 4.8.1

I was unable to build a 32bit version of perl in my Linux environment, I am currently trying to get a version 64bit version of Activestate 5.24 to work. Perl -V for 32bit Activestate 5.8.9 and 64bit strawberry is below.

Thoughts? Have I missed something simple?

Set up gcc environment - gcc.exe (TDM-1 mingw32) 4.4.1 Summary of my perl5 (revision 5 version 8 subversion 9) configuration: Platform: osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef usethreads=define use5005threads=undef useithreads=define usemulti +plicity=de fine useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='C:/MinGW32/bin/gcc.exe', ccflags ='-DNDEBUG -DWIN32 -D_CONSOLE + -DNO_STRI CT -DHAVE_DES_FCRYPT -DNO_HASH_SEED -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST +_IN_INC -D PERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_R +EADFIX -DH ASATTRIBUTE -fno-strict-aliasing -mms-bitfields', optimize='-O2', cppflags='-DWIN32' ccversion='', gccversion='gcc.exe (TDM-1 mingw32) 4.4.1', gccosand +vers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64 +', lseeksi ze=8 alignbytes=8, prototype=define Linker and Libraries: ld='C:\MinGW32\bin\g++.exe', ldflags ='-L"C:\Perl\lib\CORE"' libpth=\lib libs=-lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 +-lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion +-lodbc32 - lodbccp32 -lmsvcrt perllibs=-lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvap +i32 -lshel l32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lvers +ion -lodbc 32 -lodbccp32 -lmsvcrt libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl58.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -L"C:\Perl\lib\CORE"' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT PERL_IMPLIC +IT_SYS PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_FAST_STD +IO USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE Locally applied patches: ActivePerl Build 827 [291969] f7bbab select() generates 'Invalid parameter' messages on Wind +ows Vista. 36f064 do/require don't treat '.♀oo' or '..♀oo' as + absolute paths on Win dows 287a96 Fix -p function and Fcntl::S_IFIFO constant under Micro +soft VC co mpiler Iin_load_module moved for compatibility with build 806 Less verbose ExtUtils::Install and Pod::Find Rearrange @INC so that 'site' is searched before 'perl' Partly reverted #dafda6 to preserve binary compatibility 5e162c Problem killing a pseudo-forked child on Win32 3e5d88 ANSIfy the PATH environment variable on Windows c71e9b,29e136 win32_async_check() can loop indefinitely aeecf6 Fix alarm() for Windows 2003 Built under MSWin32 Compiled at Jan 26 2010 21:15:51 @INC: C:/Perl/site/lib C:/Perl/lib Platform: osname=MSWin32, osvers=6.3, archname=MSWin32-x64-multi-thread uname='Win32 strawberry-perl 5.24.0.1 #1 Tue May 10 21:30:49 2016 +x64' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags =' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE -DPERL +_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fwrapv + -fno-strict-aliasing -mms-bitfields', optimize='-s -O2', cppflags='-DWIN32' ccversion='', gccversion='4.9.2', gccosandvers='' intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678 +, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +6, longdblkind=3 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='lo +ng long', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-s -L"C:\STRAWB~1\perl\lib\CORE" -L"C:\STRAWB~ +1\c\lib"' libpth=C:\STRAWB~1\c\lib C:\STRAWB~1\c\x86_64-w64-mingw32\lib C:\S +TRAWB~1\c\lib\gcc\x86_64-w64-mingw32\4.9.2 libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 +-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -l +mpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdl +g32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_3 +2 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 libc=, so=dll, useshrplib=true, libperl=libperl524.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=xs.dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -s -L"C:\STRAWB~1\perl\lib\CORE" +-L"C:\STRAWB~1\c\lib"' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES HAVE_INTERP_INTERN MULTIPLICITY PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ONE_AT_A_TIME_HARD PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PERL_PRESERVE_IVUV USE_64_BIT +_INT USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF Built under MSWin32 Compiled at May 10 2016 21:42:01 @INC: C:/Strawberry/perl/site/lib C:/Strawberry/perl/vendor/lib C:/Strawberry/perl/lib

  • Comment on Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
  • Download Code

Replies are listed 'Best First'.
Re: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by davido (Cardinal) on Dec 21, 2016 at 18:09 UTC

    Are your tests getting deeper into swap or virtual memory while running under 64 bit builds?

    Under a 64 bit build of Perl, more memory is consumed because the architecture is twice as wide. A scalar on a 32 bit build may take considerably less of the system's memory than a scalar on a 64 bit build. Now apply that multiplier across large datastructures. If the memory footprint is driving your application into swap or virtual (storage-based) memory, you could start seeing large performance hits. The solution is, as always, more efficient algorithms or more hardware (in this case RAM).


    Dave

      Thanks for the reply.

      The test cases consume less than 100MB ram during runtime (both 64bit and 32bit builds). The data structures are mainly numeric in nature, typically doubles.

Re: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by talexb (Chancellor) on Dec 21, 2016 at 16:42 UTC

    My first thought is that the issue is with memory, and not whether it's a 32-bit or 64-bit processor running the package. Do the servers have the same amount of memory/ Is there sufficient memory to run the programs? Are there more instances of the programs running? What kind of shape is the cache in?

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by BrowserUk (Patriarch) on Dec 21, 2016 at 18:10 UTC

    It'd be a whole lot easier to diagnose if we knew what the application was doing. And easier still if we could see the code.

    The first thing you need to do is profile both and work out where the time is being used. Once you know that, it'll be easier to reason about the cause.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      It'd be a whole lot easier to diagnose if we knew what the application was doing. And easier still if we could see the code.

      The first thing you need to do is profile both and work out where the time is being used. Once you know that, it'll be easier to reason about the cause.

      Thanks for the reply. As the the other Anonymous Monk said, the app is a mix of perl/xs/c and is difficult to profile on Windows (normally I would use valgrind on linux). I did try Very Sleepy, but nothing stood out. Can you recommend a profiler for windows?

      The app itself mainly deals with numeric data. Lots of double datatypes in C structs, with data manipulation in C, with perl providing an 'API' to make things easy for the end user, so something like this:

      my $sum = $apple + $orange;
      The $apple and $orange variables are objects (typically mapped to large double vectors), the vector calculation would be carried out in C etc.

      While our performance benchmarks are representative of our real workloads, they are very broad in nature...and contain lots and lots of code... While all the tests perform worse, the ones that stand out most (ie, 80%+ worse) do create more perl/xs objects than typical, so perhaps that is where I should start looking?

        I did try Very Sleepy, but nothing stood out. Can you recommend a profiler for windows?

        Hm. That's the one I use for profiling C code; and I've found it very effective. Effective to the point of detecting a difference between two identical opcodes where one causes a cache miss and the other doesn't.

        I'd love to take a look at the output from identical runs with the two builds.

        the ones that stand out most (ie, 80%+ worse) do create more perl/xs objects than typical, so perhaps that is where I should start looking?

        I'd start by rebuilding the 5.24 without PERL_COPY_ON_WRITE & PERL_HASH_FUNC_ONE_AT_A_TIME_HARD individually and together and see what effect they have.

        I believe (perhaps wrongly) that the first is a space for speed tradeoff which might be factor.

        The second is an (IMO) unnecessary fix for a non-problem that substitutes a different, more time consuming hashing function for the one used in 5.8.9 for "security reasons". Try replacing PERL_HASH_FUNC_ONE_AT_A_TIME_HARD with PERL_HASH_FUNC_ONE_AT_A_TIME_OLD and see if that makes any difference.

        Beyond those guesses, I'd need to see the profiler output.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.

        Are you able to try Devel::NYTProf? I've found this module to be a very powerful profiling tool.

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      It is a mix of perl/xs/c on windows.

        For example, if your app is doing lots of sorting of strings, the presence of these defines in the 5.24 build parameters:

        USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_ +LOCALE_TIME

        Might be the source of the slowdown; but if your app is purely mathematical, probably not.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.

        That tells us what it is, but not what it is doing. But if you don't need help ...


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1178305]
Approved by talexb
Front-paged by snoopy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2024-04-24 09:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found