Perl and memory usage. Can it be released?

sherab has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl and memory usage. Can it be released? by BrowserUk (Patriarch) on Feb 07, 2014 at 16:54 UTC
Two thoughts: If your program processes the files line-by-line, then the maximum memory it would need at any given time is the length of the longest line. Which for most files is a trivial amount. If you really need to load the files in their entirety each time, then slurping them into a single huge scalar rather than an array of lines, would ensure that when the file is processed and the scalar is freed, the whole amount of the scalar would be returned to the OS, not just the process pool. Note: I know this to be true of Perl running under Windows for single allocations over 1MB. The picture of whether other OS mallocs have similar arrangements for large, single allocations isn't so clear. Of course, this will only help if you can avoid breaking the single scalar into an array or hash. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Perl and memory usage. Can it be released? by kennethk (Abbot) on Feb 07, 2014 at 16:58 UTC
As I understand, the perl process holds onto any heap memory it gets allocated (I could be wrong), so yes, in your case it's going to always have the memory footprint of the large use case. There are couple approaches that might help ameliorate this for you: Can you modify your file parsing so it's streaming instead of slurping? Just because you need to process 90 MB doesn't necessarily mean you need to hold onto 90 MB of data. Can you combine the above with a database? For example, by using an SQLite database, you should be able to avoid a large memory footprint for perl while still maintaining access to the data. You could swap that to an in-memory database if file access times become prohibitive, but I'm unclear as to whether that would create a permanent memory footprint. Finally, you could have a parent process that forks, and the children parse your files. That way, when the child is reaped, the memory is recovered. See also http://stackoverflow.com/questions/9733146/tips-for-keeping-perl-memory-usage-low. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply]
Re: Perl and memory usage. Can it be released? by davido (Cardinal) on Feb 07, 2014 at 17:58 UTC
If you require that the entire file be held in memory at once, create a supervisor script that assigns a file to a separate worker process and waits for results. When the worker process finishes and sends its results back to the supervisor, the worker terminates, freeing its resources. The supervisor's memory consumption will remain stable at all times. The worker processes may use a little, or a lot... but when they're done, they vanish and release their memory. Dave	[reply]
Re: Perl and memory usage. Can it be released? by LanX (Saint) on Feb 07, 2014 at 17:01 UTC
IIRC the "default" answer is that memory is only returned to the OS when the Perl process ends, dunno if there is any reliable documentation for a defined behavior. BUT as others have already pointed out, why do you need to load all 90 MB at once? Consider using a sliding window technique if you really need to investigate consecutive chunks of data. edit Worstcase consider running a separate process. Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re: Perl and memory usage. Can it be released? by ikegami (Patriarch) on Feb 07, 2014 at 20:37 UTC
Does it have to be the same process? You could replace `do_work($qfn)` [download] with something like `if (my $pid = fork()) { waitpid($pid, 0); } else { do_work($qfn); }` [download]	[reply] [d/l] [select]
Re: Perl and memory usage. Can it be released? by oiskuu (Hermit) on Feb 07, 2014 at 21:51 UTC
On Linux, the glibc malloc behavior is influenced by environment variables. Large allocations are performed via mmap, smaller chunks usually live on data segment arena, which grows or shrinks via brk. Default mmap threshold might be 128k. For example: $ strace -e brk,mmap perl -e 'pack q(x200000)' ... brk(0x79f000) = 0x79f000 mmap(NULL, 200704, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, -1, 0) = ... $ export MALLOC_MMAP_THRESHOLD_=300000 $ strace -e brk,mmap perl -e 'pack q(x200000)' ... brk(0x79f000) = 0x79f000 brk(0x7db000) = 0x7db000 brk(0x7aa000) = 0x7aa000 First time, the memory was obtained via mmap; second time, by growing the arena. Arena may also shrink (here it was possible), but even if it doesn't, the unused pages are typically not much of a concern. (mmap-ed storage is unmapped when freed.) If the process is long-lived, does great many allocations at various stages, then memory fragmentation may become a problem. (Web browsers come to mind.) When processing file after a file as you describe, this is unlikely to matter either. Memory gets allocated and released in full every time. Just be sure there are no leaks.	[reply]
Re^2: Perl and memory usage. Can it be released? by bulk88 (Priest) on Feb 08, 2014 at 01:56 UTC
POSIX brk memory will never shrink due to fragmentation. mmap memory/Win32 malloc can shrink because its all managed in a linked list chain, and the mem pages are randomly scattered through out the process.	[reply]
Re^3: Perl and memory usage. Can it be released? by oiskuu (Hermit) on Feb 08, 2014 at 18:36 UTC
GNU libc allocator is derived from Doug Lea malloc, a proven general-purpose allocator. Go on, unpack and read the source and the comments (I'm looking at glibc-2.17/malloc/malloc.c) True, trims do not happen much because small data gets allocated from fastbins. But try to malloc a lot of somewhat larger blocks (couple hundred bytes each), and free them all. You shall see a shrink. Update: from said malloc.c: `DEFAULT_TRIM_THRESHOLD 128 * 1024 DEFAULT_TOP_PAD 0 DEFAULT_MMAP_THRESHOLD 128 * 1024 DEFAULT_MMAP_MAX 65536` [download] And please don't say "never". E.g. freeing a block 64k to 128k in size triggers fastbin consolidation. If your program has performed a work cycle, freeing all temps, then it is quite possible a trim takes place. It depends on usage.	[reply] [d/l]
Re^4: Perl and memory usage. Can it be released? by bulk88 (Priest) on Feb 09, 2014 at 04:03 UTC
Re: Perl and memory usage. Can it be released? by sundialsvc4 (Abbot) on Feb 07, 2014 at 17:49 UTC
Well, this absolutely qualifies as a hack, but I know that it is a hack that is sometimes used ... and useful. After the process has run through some number of requests, let it choose to commit suicide. Then, ensure that some `init`-like process will recognize its death and immediately re-spawn it. Exactly as is done sometimes with FastCGI, or even with mod_perl, especially when the app in question is oldy-moldy. You make no attempt to re-engineer how the app goes about its business, having established that it still seems to work. You simply modify it to, every now and again, put itself to death. (Which is n-o-t the same as `kill`ing it!) Of course, it is also possible to run it by means of a do-nothing “babysitter” process that launches the other process as a child, waits for it to die, and then takes care of re-launching it ... forever. Hack. Wart. Inelegant. Smells bad. Quick. Works. Done.
Re: Perl and memory usage. Can it be released? by bulk88 (Priest) on Feb 19, 2014 at 03:57 UTC
I filed a bug, it might be relevent. Perl does not ever release certain kinds of memory. https://rt.perl.org/Ticket/Display.html?id=121274	[reply]


laziness, impatience, and hubris
	PerlMonks

Perl and memory usage. Can it be released?

edit