This needs twice as much memory, though (temporarily, for building the intermediate list).
Usually, this won't be an issue, but it might be relevant when dealing with huge arrays, in order to limit peak memory usage.
| [reply] |
This needs twice as much memory, though (temporarily, for building the intermediate list).
Only if [list of data extracted from line] takes the same amount of memory as the line itself. It's more likely the array takes more memory than the list.
| [reply] [d/l] |
| [reply] |
## Create a test datafile of ~ 4MB
perl -E"say 'x 'x10 for 1 .. 2e5" > junk.dat
## Then load it into an array of arrays using a while loop
## and check the memory consumed using the Task Manager or TOP
perl -E"$n=0;$a[$n++]=[split] while $_=<>; <STDIN>" junk.dat
## On my system the process has used 214.8 MB
## Now do the same thing using map
perl -E"@a=map[split],<>; <STDIN>" junk.dat
## On my system this process has used 345.1 MB.
With map,
- First one list of all the lines in the file is contructed;
- From that a second (output) list of all the references to small arrays is contructed;
- Finally, that list is assigned to the array.
Whilst the final AoA will consume the same amount of memory in both cases, the memory consumed by the intermediate lists will have considerably increased thhe overall memory required to construct the final array.
And depending upon your OS, the time taken to build it can be considerable longer using map.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Sorry, I should've been more precise. Perl constructs an intermediate list of arrays, which is held on Perl's stack (in its entirety) before it is assigned to the final array. This temporarily consumes a lot of memory, which is not returned to the OS (at least not with typical Unix-builds of Perl). Of course, the memory is returned to Perl's own memory pool, so it may be reused later. But the peak memory usage of the process increases.
As soon as references are involved, the additional temporary data only involves the first level elements, i.e. the array references in this case. The referenced data isn't being duplicated, of course. Still, if the ratio of the references to the payload data is bad (i.e. scond-level arrays with only few non-complex items), there can still be considerable overhead.
Just for comparison, for anyone interested:
sub mem {
print "$_[0]:\n";
system "/bin/ps", "-osize,vsize", $$;
}
my @a;
my %tests = (
# immediate data - no references
iter_flat =>
sub {
my $n = shift;
push @a, $_*42 for 1..$n;
},
func_flat =>
sub {
my $n = shift;
@a = map $_*42, 1..$n;
},
# indirect/referenced data
iter_ref =>
sub {
my $n = shift;
push @a, [ $_*42 ] for 1..$n;
},
func_ref =>
sub {
my $n = shift;
@a = map [ $_*42 ], 1..$n;
}
);
my $what = shift @ARGV;
my $n = shift @ARGV || 10_000_000;
mem("before");
$tests{$what}->($n);
mem("after");
$ ./883539.pl iter_flat
before:
SZ VSZ
608 22032
after:
SZ VSZ
395196 416620
$ ./883539.pl func_flat
before:
SZ VSZ
608 22032
after:
SZ VSZ
1390892 1412316 # map needs 3.5 times as much
$ ./883539.pl iter_ref
before:
SZ VSZ
608 22032
after:
SZ VSZ
1547832 1569256
$ ./883539.pl func_ref
before:
SZ VSZ
608 22032
after:
SZ VSZ
2571632 2593056 # map needs 1.7 times as much
| [reply] [d/l] [select] |