in reply to Re^4: Does "preallocating hash improve performance"? Or "using a hash slice"?
in thread Does "preallocating hash improve performance"? Or "using a hash slice"?
Okay, let me try responding to this again. Blow by blow this time.
I hope perl is not so inefficient that it copies all the content of a list when it needs to process it, when simply pointing to the already allocated elements would be enough.
Doesn't change anything.
AFAIK: Perl(5) is still more efficient (for most things) than other similar languages (Python, Ruby, BASH, CMD, Powershell etc.); in part because of the efficiency (minimalism) of its runloop and stack-based opcodes.
However, it is less efficient than (uncompiled) Java or Javascript or Julia etc.; in part because of its stack-based opcodes.
The stack-based opcodes lend themselves to "efficient dispatch" when running as a pure bytecode interpreter; but they do not lend themselves to great efficiency gains through JIT compilation.
The J-languages were either designed from the outset (Julia) or have evolved over time under corporate sponsership (Java & Javascript), to lend themselves to efficient JIT compilation at runtime. Perl has not.
I didn't say anything about "copying ... a list". I said: "Turn the ... array, ... into a ... list".
Perl's opcodes are stack based. All arguments are passed on the (perl! Not C) stack. The arguments to an array assignment: @{ @keys } = @values; are the two arrays @keys & @values. These are passed to the opcode on the stack.
The process of converting an array to a list on the stack is not a copy operation per se; but rather an aliasing operation. The original SVs in the array are not copied, but their address are pushed onto the stack. A 10 million element array, results in 10 million SV*s being pushed to the stack.
Space for those 10 million SV*s has to be allocated; and all the stack pointers and the associated "Mark stack" -- another stack that manages the Perl stack see illustration -- have to be maintained and updated.
Ditto for the second argument array (@values). So, the process is not a "copy operation", but neither is it trivial.
It does "simply point"; but it points at each of the 20 million elements individually.
Couldn't this be changed to simply stack 2 pointers, one to each of the two arrays?
Yes, if the arguments to the array assignment are both arrays, but what about this: @hash{ 1, @small, 10 .. 25, @more } = @array[ reverse 0 .. $#array ];
The point of that contrived example is to show that the destination and sources lists of a, more properly termed list assignment (though known internally as array assignment), can be made up of elements drawn from (any combination of) array and hash elements, constants, ordinary scalars, slices of arrays or hashes etc.
So, for those occasions when the destination list is entirely defined by one array, and the source list entirely defined by another array, it would be possible to only pass references to those two arrays on the stack, and thus avoid large scale allocations; but is would require special casing, and probably another opcode.
Even more so with arrays.
Um. The example contains arrays. So if the previous sentence was not talking about arrays, what was it talking about?
I'd also expect some COW mechanism to remove a lot of allocating and copying.
CopyOnWrite generally applies to entire data segments of a process that is cheaply shared with another process; that's obviously not applicable here.
Perl does have some internal flags and references with "COW" in their names; where by the copying of scalars is avoided (by aliasing) until and unless they are written to; but as the argument lists (destination & source) to op_aassign are inherently readonly, that does not apply here either.
Keys might be more trouble though, as all data stored in them have to be stringified first (and I wouldn't be surprised to learn that hash keys always hold a copy of the initial string, since as far as I can tell they are not standard scalar values).
Since CoW is not applicable to the destination and source lists; most of that is irrelevant, but I can confirm that hash keys are not normal scalars, and even if they are already in memory as scalars, the text will be copied into the HV structure.
I agree that the memory usage, and number of copies is certainly higher when you go all the way to slicing, but I don't expect "at least 4 times more" memory.
For the statement: @hash{ @keys } = @array; here is the memory usage:
C:\test>p1 print mem;; 9,356 K $#keys = 10e6; $#values = 10e6; $keys[ $_ ] = "$_", $values[ $_ ] = $_ + for 0 .. 10e6;; print mem;; 2,000,088 K @hash{ @keys } = @values;; print mem;; 4,394,716 K print size( \%hash );; 783106748
So, final memory usage: 4,394,716 K - initial memory usage: 9,356 K = memory used by the two arrays, the final hash and all the intermediate allocations for the stack, smaller versions of the hash during construction and other bits & bobs: 4,385,360 K or 4490608640 bytes.
And, 4490608640 / 783106748 = 5.734350586901084908030954676437. Your expectations are wrong.
I can't see any value going further.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: Does "preallocating hash improve performance"? Or "using a hash slice"?
by huck (Prior) on Feb 21, 2017 at 21:29 UTC | |
Re^6: Does "preallocating hash improve performance"? Or "using a hash slice"?
by Eily (Monsignor) on Feb 22, 2017 at 13:34 UTC |