in reply to Re^2: Tokenising a 10MB file trashes a 2GB machine
in thread Tokenising a 10MB file trashes a 2GB machine
which on my system gives the following output:my $content = decode('UTF-8', 'tralala ' x 1E6); my @a; $#a = 10_000_000; # presize array for (1..5) { print "ITER $_\n"; push @a, split m{(\p{Z}|\p{IsSpace}|\p{P})}ms, $content; procinfo(); }
which averages about 94Mb growth per iteration, or 47 bytes per string pushed onto @a; allowing 32 bytes string overhead per string (SV and PV structures), leaves 15 bytes per string, which allowing for trailing \0, rounding up to a multiple of 4, malloc overhead etc etc, looks reasonable.ITER 1 Vsize: 248.18 MiB ( 260235264) RSS : 62362 pages ITER 2 Vsize: 317.14 MiB ( 332550144) RSS : 80000 pages ITER 3 Vsize: 393.71 MiB ( 412839936) RSS : 99598 pages ITER 4 Vsize: 579.46 MiB ( 607612928) RSS : 147156 pages ITER 5 Vsize: 625.23 MiB ( 655597568) RSS : 158895 pages
Dave.
|
|---|