in reply to Why is Windows 100 times slower than Linux when growing a large scalar?

Looking at dependency walker, ActiveState 5.10.1 has a Perl510.dll with entry points win32_malloc, win32_calloc and win32_realloc. In the source code they just call the CRT malloc/calloc/realloc, they don't do any magic with Win32 Heap APIs.

I noticed that the code is all compiled in Debug. A feature of Windows is that a process can have custom heaps, and MSCRT uses a different heap for malloc/calloc/realloc whilst in Debug. For example it adds sanity markers between each allocated block, keeps track of each allocation, and so on. gcc can do a similar thing but requires environment variables to be set.

Whether Debug would have such a drastic effect on performance I cannot say, but its a good place to start.

Update: For details of the debug overhead, see http://msdn.microsoft.com/en-us/library/bebs9zyz(VS.80).aspx.
  • Comment on Re: Why is Windows 100 times slower than Linux when growing a large scalar?