Looking at dependency walker, ActiveState 5.10.1 has a Perl510.dll with entry points win32_malloc, win32_calloc and win32_realloc. In the source code they just call the CRT malloc/calloc/realloc, they don't do any magic with Win32 Heap APIs.
I noticed that the code is all compiled in
Debug. A feature of Windows is that a process can have custom heaps, and MSCRT uses a
different heap for malloc/calloc/realloc whilst in Debug. For example it adds sanity markers between each allocated block, keeps track of each allocation, and so on. gcc can do a similar thing but requires environment variables to be set.
Whether Debug would have such a drastic effect on performance I cannot say, but its a good place to start.
Update: For details of the debug overhead, see
http://msdn.microsoft.com/en-us/library/bebs9zyz(VS.80).aspx.