magarwal has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am reading a 2Gb file in a perl hash, then dump the data on screen. It looks that is using much more than 2Gb of memory.
Is it a expected behavior and how much memory does this file should ideally take.

How can I best estimate the memory usage when writing a perl code.

Please let me know your inputs.

Thanks,
Manu

Replies are listed 'Best First'.
Re: Memory usage by perl application
by Corion (Patriarch) on Dec 22, 2010 at 15:30 UTC

    See illguts. Basically, every scalar value (SV in internal-speak) in Perl takes up 16 bytes plus whatever payload (think "string length") is stored in the value. Hash keys are also refcounted, maybe they also are SVs. So, if you store very many, relatively small items in your hash, as keys and values, your memory needs might be up to 16 or 32 times the size of the input file, at least if the keys are all different.

    You can easily move your hash to disk by using DB_File or one of the other tied hash implementation (SDBM_File, GDBM_File). This means your hash access is slower, but you are only limited by disk space, not core memory.

    Alternatively, maybe you can easily save memory by simply not reading the whole file into a hash, by changing to a different algorithm. But for that, we will need to see your data and code.

      Hi Corion,

      I am doing this just as a exercise to check memory usage by the system. Please find below my code snippet,
      open FILE,$input_file or die $!; my @data; while(<FILE>){ push(@data,$_); } close FILE;
      The file i am reading is 2 Gb in size.
      In this scenario, the total memory usage should be around 2Gb or something about 4 Gb. My system shows about 3.8 Gb of memory usage.
      Does variable FILE also stores the full file in memory and again my array is having the full file.

      Is this acceptable. Let me know your inputs.

      Thanks,
      Manu

        In your first post, you talked about a hash. I see no hash in your code.

        See illguts, again. It talks about the underlying data structures and their memory needs.

        Depending on how large each line in $input_file is, Perl will, again, use up to 16 times the memory (based on the calculation that each line is one character long, and takes an overhead of 16 bytes, disregarding the SvPV entry and overhead of the array itself).

        This behaviour is acceptable to me. There are very few reasons to read a file completely into an array. If you really need to handle generic large data structures, most likely a database like Postgres or SQLite will suit your needs far better than storing the data through Perl can.