BUU has asked for the wisdom of the Perl Monks concerning the following question:

Some ground work:

The code:
sleep;
Takes roughly 1.5 megs, as reported by taskmanger in windows.

The code:
my $x = "a" x 5_000_000; sleep;
Takes roughly 11.2 megs, again as reported by taskmanager.

The code:
my $x = "a" x 5_000_000; my $y = $x; sleep;
Takes roughly 16.1 megs.

Can anyone explain why it seems to take half as much memory to duplicate it? I could see either requiring twice as much memory to realllocate the entire string again, or only requiring a miniscule amount to just store a pointer to the original data, but half?

Replies are listed 'Best First'.
Re: Possibly silly perl memory allocation question, duplicating scalars
by BrowserUk (Patriarch) on Dec 14, 2004 at 23:59 UTC

    As the others have identified, x creates the scalars nice and efficiently, and then throws that away by copying it into the scalar, rather than pointing the scalar at the string it created :(

    If you're on a version of Perl that supports memory files, here's a technique I use for allocating big strings.

    It's more efficent than x in two ways:

    1. No duplication.
    2. No initialisation.

    Of course, the latter may be a downside too.

    #! perl -slw use strict; our $SIZE ||= 10_000_000; sub allocBig { local $/; open my $memFile, '>', \$_[ 0 ] or die $!; seek $memFile, $_[ 1 ], 0; print $memFile chr(0); return; } printf 'Check '; <STDIN>; my $bigScalar; allocBig $bigScalar, $SIZE; print length $bigScalar; printf 'Check '; <STDIN>; __END__ P:\test>414880 Check 1660/528k 10000002 Check 1888/10376k P:\test>414880 -SIZE=200000000 Check 1664/528k 200000002 Check 1892/196104k

    Of course, then you face the problem of using it without it getting freed and replaced, but that's what substr and lvalue refs are for :)


    Examine what is said, not who speaks.        The end of an era!
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      This is an interesting technique. I have a couple of questions though.

      I'm not clear what the effect of the statement <STDIN> is after printf 'Check'? It waits for input but doesn't assign the input to anything. The docs don't talk about using <> without an assignment (except in a conditional). Would this be considered a void or boolean context? Your output shows some process info on the same line as check. Is that a side effect? Is that result OS specific? I get nothing back on my system (WinXP w/Activstate 5.6.1)?

      Is the NULL value at the end of $bigScalar necessary as a perl internal or are you using it as an end of file marker for program control?

      Just curious. Thanks for an interesting half hour.

      PJ
      use strict; use warnings; use diagnostics;

        The <STDIN> is there purely to make the program stop and wait at that point while I 'Check' the memory consumption in the task manager. The void context simply means that whatever input is typed, it is simply discarded.

        The memory figures you see are just what I typed at that point so as to record that information as a part of teh console log. They are the "Memory usage" and "VM size" figures from the Task Manager Processes tab.

        The null byte I write could be any value. You just have to write something, after you do the seek, to cause the 'file' to be extended to that point. Just as would with a normal file.


        Examine what is said, not who speaks.        The end of an era!
        "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
        "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      That's extremely interesting, I'll have to take a look at it. I'll also have to take a close look at the substr lvalue refs...
Re: Possibly silly perl memory allocation question, duplicating scalars
by ysth (Canon) on Dec 14, 2004 at 22:48 UTC
    Most ops have a target, an invisible lexical that stores their result. "a" x 5_000_000 is initially stored in this target (bound to that x operation), then assigned (copied) into $x. For some operations of form $lexical = expr1 op expr2 the op target is diverted onto $lexical and the assignment is optimized away, but the repeat (x) op isn't one of those; don't know why.

    Targets stay allocated for as long as their code is around (in the theory that next time the op is used, space for its result needn't be allocated again); the only easy way to reclaim their memory is to remove all references to the code they are in.

Re: Possibly silly perl memory allocation question, duplicating scalars
by ikegami (Patriarch) on Dec 14, 2004 at 22:41 UTC

    Could it be:

    • A temporary takes 5MB.
    • The copy in $x takes 5MB.
    • The copy in $y takes 5MB.
Re: Possibly silly perl memory allocation question, duplicating scalars
by zejames (Hermit) on Dec 14, 2004 at 22:49 UTC

    I guess perl implements some "Copy on Write" that makes use memory in a intelligent way. Anyone to confirm/deny ?


    --
    zejames