Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have tracked down a performance issue in my perl script which is killing me. I have created a test script below that highlights the issue. I am running activestate perl 5.8.9 (full perl -V below).

In my real script I am receiving data from a socket and adding it to an object (the data is binary, hence the pack) once the data has been received (I can never know the size of the data), I use storeable to freeze the object and pass that onto another thread for processing. Everything works well, except when I am dealing with large (ish) datasets.

Initially I was storing the data in a hash, and it was the freeze that was taking a massive amount of time (longer that it took to process the data on the socket). Google seem to suggest that this problem was due to the use of Windows malloc rather than perls internal malloc. I changed the object to use a single long (binary) string, but now building this string takes a long time (and when I CTRL-C the script I sometimes get Out of memory! panic: pp_iter at X errors). If I pre-allocate memory for the string (using $string = 1 x 56000000), the script below finishes in a second. If I dont preallocate the string the script takes over 3 minutes.

I assume my problem is the same Windows malloc problem that was suggested for the freeze issue? Any other way around this? Am I missing something?

Regards, red.
Test Script ================== use strict; use warnings; use Storable qw(freeze thaw); #number of items to pack my $iter = 1000000; #our string to pack into my $string =''; #time how long it takes to create a string large enough to hold the da +ta... Time(); #if the line below is commented out, things take a long time... $string = 1 x ($iter*56); Time(); #reset our string... $string=''; #now lets create the data block... for my $count (1..$iter) { $string.=pack('ddddddd',1,2,3,4,5,6,7); } Time(); #how long does it take to freeze this object? my $fr = freeze( \ $string); Time(); print "Finished\n"; sleep(2000); sub Time { my ($user,$system,$cuser,$csystem) = times; print "$user,$system\n"; } =======================

C:\MinGW>perl -V Set up gcc environment - gcc (TDM-1 mingw32) 4.4.1 Summary of my perl5 (revision 5 version 8 subversion 9) configuration: Platform: osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef usethreads=define use5005threads=undef useithreads=define usemulti +plicity=de fine useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-DNDEBUG -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE +_DES_FCRYP T -DNO_HASH_SEED -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IMPL +ICIT_CONTE XT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX -DHASATTRIBU +TE -fno-st rict-aliasing -mms-bitfields', optimize='-O2', cppflags='-DWIN32' ccversion='', gccversion='gcc (TDM-1 mingw32) 4.4.1', gccosandvers +='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64 +', lseeksi ze=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-L"C:\Perl\lib\CORE"' libpth=\lib libs=-lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 +-lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion +-lodbc32 - lodbccp32 -lmsvcrt perllibs=-lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvap +i32 -lshel l32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lvers +ion -lodbc 32 -lodbccp32 -lmsvcrt libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl58.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -L"C:\Perl\lib\CORE"' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT PERL_IMPLIC +IT_SYS PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_FAST_STD +IO USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE Locally applied patches: ActivePerl Build 826 [290470] f7bbab select() generates 'Invalid parameter' messages on Wind +ows Vista. 36f064 do/require don't treat '.♀oo' or '..♀oo' as + absolute paths on Win dows 287a96 Fix -p function and Fcntl::S_IFIFO constant under Micro +soft VC co mpiler Iin_load_module moved for compatibility with build 806 Less verbose ExtUtils::Install and Pod::Find Rearrange @INC so that 'site' is searched before 'perl' Partly reverted #dafda6 to preserve binary compatibility 5e162c Problem killing a pseudo-forked child on Win32 3e5d88 ANSIfy the PATH environment variable on Windows c71e9b,29e136 win32_async_check() can loop indefinitely aeecf6 Fix alarm() for Windows 2003 Built under MSWin32 Compiled at May 24 2009 09:21:05 @INC: C:/Perl/site/lib C:/Perl/lib .

Replies are listed 'Best First'.
Re: Memory allocation/performance issue for large strings (of binary data) under Windows.
by BrowserUk (Patriarch) on Nov 30, 2009 at 04:59 UTC

    Using Storablefreeze() on a packed scalar makes no sense at all!

    Freezing is only useful for encoding data structures into single scalars.

    A scalar is just a large chunk of ram with some header information, freezing it will achieve nothing useful. So just pass the scalar.

    And then again, if you are having to pack the numbers you are reading from the socket, then you must be reading them in as ascii. And if once you've passed them to another thread, you are going to process them as numbers with Perl, then you are going to have to unpack them again before you can do so. So why are you packing them?

    You will save (a little) space by packing them, but not so much as to be particularly significant. And given the extra overhead (time) it takes to put them through the pack/unpack, and time seems to be your limitation, don't do that. You could just concatenate (or better; overwrite a buffer with) the ascii as you read it and then split it up when you've passed it to the processing thread.

    The only real advantage of packing is that it allows you to know how much space is required to hold a given number of integers. But unless you know how many you are going to read and pass, that isn't much advantage. You will still need to either: pre-allocate space larger than you ultimately will ever need; or accept that you will sometimes need to grow the buffer.

    And finally, "passing a (large) scalar between threads", ineveitably means copying it. Depending upon how you "pass it", possibly 2 or more times. Better to pre-allocate a shared buffer (or two); overwrite the data directly into that buffer; and then pass a simple flag between the threads to tell the processing thread which buffer contains the data that is ready for processing.

    Your mocked up test script doesn't tell me enough about your real script to allow me to offer something representative:

    • Do you need all 7 million integers before you can start processing?
    • Will you be reading/passing/processing, just a single batch of 7 million numbers?
    • Or many batches of 7 million numbers?
    • Or is 7 million a guestimate of the total, that you will read in many smaller batches?
    • How will the socket reader know when it has finished reading a batch?

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      You are correct, my example is rather flawed and in hindsight I could have simplified the example and removed the pack.

      The packed structure eventually gets to C code in an XS module. Basically the Perl side is handling the parsing/socket logic and I am using freeze/thaw as a generic message mechanism to serialize perl objects to pass across threads (an event based threading model, sockets and GUI drive event). Some of these objects get 'thawed' in a pure C thread, while others end up in perl threads.

      To Answer your direct questions:

      Yes, I need all 7 million - it's a single transaction. Batch size is unknown. Sometimes the batch is small (a couple of thousand) to a couple of million. Unknown number of batches, sometimes 1, sometimes hundreds of batches. An end message is sent on socket to end a batch (the server is a 3rd party so I can't change the API).

      In hindsight the question should have been "Why is growing a large scalar in windows so slow?"

      My results from running the example in linux (perl -V below)

      0.06,0.02 0.4,0.27 2.44,0.27 2.89,0.63 Finished

      and commenting out the line that preallocates:

      0.07,0.01 0.07,0.01 2.43,0.38 2.91,0.76

      not much difference. The windows version is over a 100 times slower (at least with my machine and Perl). Why?

      Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.2.24-7.0.3, archname=i686-linux-thread-mult +i uname='linux redhat-70-i386.activestate.com 2.2.24-7.0.3 #1 fri ma +r 14 08:28:25 est 2003 i686 unknown ' config_args='-ders -Dcc=gcc -Dusethreads -Duseithreads -Ud_sigsetj +mp -Uinstallusrbinperl -Ulocincpth= -Uloclibpth= -Accflags=-DUSE_SITE +CUSTOMIZE -Duselargefiles -Accflags=-DNO_HASH_SEED -Accflags=-DPRIVLI +B_LAST_IN_INC -Dprefix=/opt/ActivePerl-5.8 -Dprivlib=/opt/ActivePerl- +5.8/lib -Darchlib=/opt/ActivePerl-5.8/lib -Dsiteprefix=/opt/ActivePer +l-5.8/site -Dsitelib=/opt/ActivePerl-5.8/site/lib -Dsitearch=/opt/Act +ivePerl-5.8/site/lib -Dsed=/bin/sed -Dconfig_heavy=Config_static.pl - +Dcf_by=ActiveState -Dcf_email=support@ActiveState.com' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemulti +plicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS + -DUSE_SITECUSTOMIZE -DNO_HASH_SEED -DPRIVLIB_LAST_IN_INC -fno-strict +-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/inc +lude/gdbm', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DUSE_SIT +ECUSTOMIZE -DNO_HASH_SEED -DPRIVLIB_LAST_IN_INC -fno-strict-aliasing +-pipe -I/usr/include/gdbm' ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.1 2.96-85 +)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags ='' libpth=/lib /usr/lib /usr/local/lib libs=-lnsl -lgdbm -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl. +a gnulibc_version='2.2.4' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP THREADS_HAVE_PIDS USE_ITHREAD +S USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API USE_SITECUSTOMIZE Locally applied patches: ActivePerl Build 822 [280952] Iin_load_module moved for compatibility with build 806 PerlEx support in CGI::Carp Less verbose ExtUtils::Install and Pod::Find Patch for CAN-2005-0448 from Debian with modifications Rearrange @INC so that 'site' is searched before 'perl' Partly reverted 24733 to preserve binary compatibility MAINT31223 plus additional changes 31324 Fix DynaLoader::dl_findfile() to locate .so files again 26970 Make Passive mode the default for Net::FTP 24699 ICMP_UNREACHABLE handling in Net::Ping Built under linux Compiled at Jul 31 2007 20:53:37 @INC: /opt/ActivePerl-5.8/site/lib /opt/ActivePerl-5.8/lib

        FWIW: I can confirm that I see similar extreme differences between Ubunto/5.10.0 & Vista/5.10.1 :(

        Looking at the differences in compile-time options used:

        Compile-time options: Linux: THREADS_HAVE_PIDS USE_REENTRANT_API Windows: PERL_IMPLICIT_SYS PL_OP_SLAB_ALLOC USE_FAST_STDIO

        Which, if any, of those is the source of the difference ("SLAB_ALLOC"?) will require someone with a much better understanding of the internals to answer.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Memory allocation/performance issue for large strings (of binary data) under Windows.
by GrandFather (Saint) on Nov 30, 2009 at 03:29 UTC

    So precreate the string. Doing so doesn't cost a lot (assuming your sample represents your real code) so probably doesn't impact performance much for small numbers of iterations but, as you have discovered, provides a big win for large numbers of iterations.

    If you really don't want to do that, you can get most of the benefit by using a piecemeal preallocate and copy technique:

    use strict; use warnings; #number of items to pack my $iter = 1000000; my $string; my $startAlloc = 0; my $preAllocSize = $startAlloc; Time (); $$string = ''; #now lets create the data block... for my $count (1 .. $iter) { if ($preAllocSize && length ($$string) >= $preAllocSize) { $preAllocSize = length ($$string) * 2; my $newStr = 1 x $preAllocSize; $newStr = $$string; $string = \$newStr; } $$string .= 'x' x 20; } Time (); print "Finished with start alloc: $startAlloc\n"; print "Final string length: ", length ($$string); sub Time { my ($user, $system, $cuser, $csystem) = times; print "$user,$system\n"; }

    for two different runs prints:

    0.015,0 18.345,29.78 Finished with start alloc: 0 Final string length: 20000000 0.015,0 0.374,0.046 Finished with start alloc: 1 Final string length: 20000000

    True laziness is hard work