Re^4: how apply large memory with perl?

Thank you, that cleared things up perfectly. I was familiar with the bit from perldata about pre-sizing large arrays for efficiency (although it describes the gain as "miniscule," which is clearly not the case here), but I assumed that meant the CPU efficiency of not requiring repeated re-allocations of memory -- the doubling you speak of. I didn't realize that those doublings were done in new memory rather than appending to what was already allocated, causing inefficiency memory-wise too. And I hadn't thought about the fact that two arrays exist at once in the first method, but it makes perfect sense.

Aaron B.
Available for small or large Perl jobs; see my home node.

Comment on Re^4: how apply large memory with perl?

Replies are listed 'Best First'.
Re^5: how apply large memory with perl? by BrowserUk (Patriarch) on Aug 09, 2012 at 14:41 UTC
I didn't realize that those doublings were done in new memory rather than appending to what was already allocated If the process already has sufficient memory for the doubling, and the memory immediately above the existing allocation is free, then the C-style array of pointers that forms the backbone of a Perl array may be realloc()able in-place, thereby alleviating the necessity for the previous and next sized generations to coexist. It also avoids the necessity to copy the pointers. But that's a pretty big if. That said, by far the biggest save comes from avoiding building big lists on the stack. For example, compare iterating an array using: for $#a = 1e6;; say grep /$$/, `tasklist`;; perl.exe 6740 Console 1 17 +,120 K $n=1e6-1; $t=time; ++$a[ $_ ] for 0..$n; print time()-$t;; 0.242353916168213 say grep /$$/, `tasklist`;; perl.exe 6740 Console 1 41 +,312 K [download] map #a = 1e6;; say grep /$$/, `tasklist`;; perl.exe 4816 Console 1 17 +,128 K $n=1e6-1; $t=time; map ++$a[ $_ ], 0..$n; print time()-$t;; 0.32020902633667 say grep /$$/, `tasklist`;; perl.exe 4816 Console 1 88 +,688 K [download] But it's not as simple as just for better than map. This time using the oft-championed Perl-style foreach-style loop: $#a = 1e6;; say grep /$$/, `tasklist`;; perl.exe 8152 Console 1 17 +,152 K $t=time; ++$_ for @a; print time()-$t;; 0.709916114807129 say grep /$$/, `tasklist`;; perl.exe 8152 Console 1 73 +,712 K [download] (in this case), The Perl foreach-style loop ends up using twice as much memory and 3 times as much cpu as the much-decried iterator-style for loop! When you routinely work with very large volumes of data and cpu-bound processes, rather than (typically cgi-based) IO-bound processes where 1 MB is often considered "big data"; you shall count yourself lucky the preponderance of programmers and pundits that fall into the latter camp have not yet exercised much influence on the nature of Perl. I revel in Perl's TIMTOWTDI, that allows me to tailor my usage to the needs of my applications; rather than being forced into the straight-jacket of the theoretical "best way" as defined by someone(s) working in unrelated fields with entirely different criteria. If I wanted the world of "only one good way", I'd use python. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: how apply large memory with perl?
by BrowserUk (Patriarch) on Aug 09, 2012 at 14:41 UTC

I didn't realize that those doublings were done in new memory rather than appending to what was already allocated

If the process already has sufficient memory for the doubling, and the memory immediately above the existing allocation is free, then the C-style array of pointers that forms the backbone of a Perl array may be realloc()able in-place, thereby alleviating the necessity for the previous and next sized generations to coexist. It also avoids the necessity to copy the pointers. But that's a pretty big if.

That said, by far the biggest save comes from avoiding building big lists on the stack. For example, compare iterating an array using:

for

$#a = 1e6;;

say grep /$$/, `tasklist`;;
perl.exe                      6740 Console                    1     17
+,120 K

$n=1e6-1; $t=time; ++$a[ $_ ] for 0..$n; print time()-$t;;
0.242353916168213

say grep /$$/, `tasklist`;;
perl.exe                      6740 Console                    1     41
+,312 K
[download]

map

#a = 1e6;;

say grep /$$/, `tasklist`;;
perl.exe                      4816 Console                    1     17
+,128 K

$n=1e6-1; $t=time; map ++$a[ $_ ], 0..$n; print time()-$t;;
0.32020902633667

say grep /$$/, `tasklist`;;
perl.exe                      4816 Console                    1     88
+,688 K
[download]

But it's not as simple as just for better than map. This time using the oft-championed Perl-style foreach-style loop:

$#a = 1e6;;

say grep /$$/, `tasklist`;;
perl.exe                      8152 Console                    1     17
+,152 K

$t=time; ++$_ for @a; print time()-$t;;
0.709916114807129

say grep /$$/, `tasklist`;;
perl.exe                      8152 Console                    1     73
+,712 K
[download]

(in this case), The Perl foreach-style loop ends up using twice as much memory and 3 times as much cpu as the much-decried iterator-style for loop!

When you routinely work with very large volumes of data and cpu-bound processes, rather than (typically cgi-based) IO-bound processes where 1 MB is often considered "big data"; you shall count yourself lucky the preponderance of programmers and pundits that fall into the latter camp have not yet exercised much influence on the nature of Perl.

I revel in Perl's TIMTOWTDI, that allows me to tailor my usage to the needs of my applications; rather than being forced into the straight-jacket of the theoretical "best way" as defined by someone(s) working in unrelated fields with entirely different criteria.

If I wanted the world of "only one good way", I'd use python.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

[reply]
[d/l]
[select]