jakito has asked for the wisdom of the Perl Monks concerning the following question:

I read data in a anonymous array reference to save memory.

In normal ways, people read file in:

sub read_file { my $url = shift; my $file_data; open my $fh, '<', $url; local $/; $file_data = <$fh>; show_size(); } show_size(); read_file('name1.txt'); read_file('name2.txt'); read_file('name3.txt'); read_file('name4.txt'); read_file('name5.txt'); show_size(); sub show_size { print `top -l 1 | grep perl | awk '{print "MEM="\$8 "\tRPRVT="\$30 +}'`; # Work on Mac OS X 10.10 }

Output:
MEM=984K+ RPRVT=N/A
MEM=2052K+ RPRVT=N/A
MEM=2056K+ RPRVT=N/A
MEM=2060K+ RPRVT=N/A
MEM=4900K+ RPRVT=N/A
MEM=34M+ RPRVT=N/A
MEM=34M+ RPRVT=N/A

scalar $file_data's data will not release until program come to end. That makes memory increase and increase.

So, I read file by this way:

sub read_file { my $url = shift; my $file_data = []; open my $fh, '<', $url; local $/; $file_data->[0] = <$fh>; show_size(); } show_size(); read_file('name1.txt'); read_file('name2.txt'); read_file('name3.txt'); read_file('name4.txt'); read_file('name5.txt'); show_size(); sub show_size { print `top -l 1 | grep perl | awk '{print "MEM="\$8 "\tRPRVT="\$30 +}'`; # Work on Mac OS X 10.10 }

Output:
MEM=1048K+ RPRVT=N/A
MEM=2084K+ RPRVT=N/A
MEM=2084K+ RPRVT=N/A
MEM=2084K+ RPRVT=N/A
MEM=4928K+ RPRVT=N/A
MEM=37M+ RPRVT=N/A
MEM=4932K+ RPRVT=N/A

Perl uses reference counting way to manage memory. When ending to the block, $file_data->[0] will automatically release.

But... Is it memory safe? Or Is there something wrong? I don't know all but it just looks great.

The script is written on Mac OS X 10.10.2 and be tested in Perl 5.18.2. Maybe you should modify this adapting for your system.

Replies are listed 'Best First'.
Re: Is it a good way to read in arr_ref? (Updated)
by Athanasius (Cardinal) on Jan 31, 2016 at 07:51 UTC

    Hello jakito, and welcome to the Monastery!

    First, your code has an error: in sub read_file () the prototype specifies no arguments, but the function is then twice called with one argument. In general, prototypes should be avoided anyway unless you have a good reason to use them. See Far More than Everything You've Ever Wanted to Know about Prototypes in Perl -- by Tom Christiansen for the gory details.

    Now to the main issue: why do say that the contents of $file_data will not be released until the program ends? That variable is reference-counted like any other, so when its reference count falls to zero it becomes eligible for garbage collection. It may not actually be garbage collected until a later time, but that applies to any variable, including $file_data->[0]. So, why do you expect the second approach to be more memory-efficient than the first?

    From your reference to Devel::Peek it seems you’ve already conducted further experiments not outlined in your post. Please supply details, and specify the platform (OS and Perl version) you’re working on. So far, my very limited experiments (using Strawberry Perl 5.22.1 on Windows 8.1 64-bit1) have not revealed any difference in memory usage between the two approaches.

    1Substituting tasklist for top.

    Update (Feb 1, 2016): I see you’ve made significant changes to your original post. The reference to Devel::Peek has gone, the system call in sub show_size now includes an awk command, and output has been added. While this new information is useful, you should clearly mark the updates as such, so that other monks coming to the thread won’t be confused by the fact that my answer relates to a question which has since changed. Or just add the new information in a new post.

    I still don’t understand why you think $file_data is immune from garbage collection?

    Cheers,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,