sulfericacid has asked for the wisdom of the Perl Monks concerning the following question:

I've been using perl for a while now and I want to get into using the best methods to do certain things (or better known as getting the most of the available memory). I have a few basic memory questions and was wondering if anyone could help me so I can grow into a more efficient programmer.

  • Is tying over 5 hashes on a single page considered a waste of memory? If I try to tye..20 databases at a time would that, generally speaking, have a big increase on load time?
  • Would it use a lot of memory if I used an automated script to parse the text of a url every 5 minutes?
  • Is it faster to do a foreach ( keys %hash) { $cnt++ or storing the hash into an array to count key/value pairs?
  • What are some common areas or traps that would increase memory wastage?

    I appreciate any information you have on memory or speed efficiency that you could spare. Thanks for your help.

    "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

    sulfericacid

  • Replies are listed 'Best First'.
    Re: Memory /speed questions
    by Paladin (Vicar) on Jul 30, 2003 at 04:28 UTC
      Perl does trade off memory for speed in quite a few places so if you are looking for super memory efficiency, Perl is probably not the best choice. That being said:
      • Depends on what you are doing with the tied hashes. You may want to look at using a full relational database like MySql or PgSql
      • No more memory than using the same script once a day, or once a year. Once the script is finished running, it usually releases all the memory it used back to the OS (this is assuming you are using something like cron to call the script every 5 minutes, and not leaving it constantly running)
      • I haven't benchmarked it, but it's probably much faster to do $cnt = keys %hash;
      • Common memory wasters
        • Reading a file all into memory, instead of a line at a time
        • large data structures all stored in memory instead of on disk with a tied hash or array
        • Using large lists in a foreach, ie. foreach (1..10000) { }. This kind of relates to the point above.
        • Probably more, but these 2 3 come to mind right off the bat, the first most often

      You are always (or almost always) going to trade off memory for speed. You just have to decide which is more important.

      Update: Benchmarked the part I hadn't benchmarked yet.

      Update2: Added more memory wasters.

        Using large lists in a foreach, ie. foreach (1..10000) { }.

        Whilst this used to be so, in recent versions of perl, (5.6.x/5.8.x) it is no longer so.

        Neither for (1 .. 10_000_000) { ... }

        nor  for ( @a ) { ... }

        will consume an (extra) memory as they are implemented as iterators. In the latter case, the control variable or $_ is aliased to the array element, so even if @a is huge, no additional memory is consumed by the construct.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

        ...large data structures all stored in memory instead of on disk with a tied hash or array..

        In (prefork) mod_perl it's quite a common technique to load lots of (constant) stuff into memory at server startup, as these will get shared between all the children. And save you CPU and disk accesses on each request So it depends on your situation whether this really is a "waster" or not.

        Liz

    Re: Memory /speed questions
    by Abigail-II (Bishop) on Jul 30, 2003 at 08:29 UTC
      Is tying over 5 hashes on a single page considered a waste of memory? If I try to tye..20 databases at a time would that, generally speaking, have a big increase on load time?

      What is a "page"? Perl doesn't have any concept of "pages", so your question is a bit strange. If you mean whether tying 5 hashes in a single program is considered a waste of memory, it all depends what your program does. If your program calculates the first 100 prime numbers, even tying a single hash is probably a waste. If your program does a tax audit for all people in the US, it might not be a waste.

      As for "tying to 20 databases" (do you mean connecting?) it will certainly take some time. But whether that's a big increase depends on the rest of the program, doesn't? Consider your databases are heavily loaded, and on average it takes a second to connect to a single database. That 20 seconds. If your program runs for a week, the 20 seconds don't matter. If your program is run every 10 seconds, the 20 seconds do matter.

      The problem with your questions is that they can't be answered without knowing a lot more about the specific program.

      Would it use a lot of memory if I used an automated script to parse the text of a url every 5 minutes?

      That depends on what you find a lot, how much text there is to parse, how you parse it, and what you are doing with the results. The best way to get an answer is to write the program and test it.

      Is it faster to do a foreach ( keys %hash) { $cnt++ or storing the hash into an array to count key/value pairs?

      That's probably the wrong question to ask. It all depends on what your purpose is of storing the data. If all you care about is "how many", both a hash and an array are silly. If you want to do something else, than that something else decides whether a hash or an array is the most useful. Both scalar keys %hash and scalar @array are constant time operations.

      What are some common areas or traps that would increase memory wastage?

      The first is: using Perl itself. Once you have committed yourself to use Perl, you are going to use a lot of memory. If you are in a situation where the memory usage of as few as 5 tied hashes are a valid concern, you shouldn't have used Perl in the first place.

      Use C. That seems to be ideal for you - there you have lots of control over the memory usage.

      Abigail

    Re: Memory /speed questions
    by Zaxo (Archbishop) on Jul 30, 2003 at 04:39 UTC

      On the third question, try, my $cnt = keys %hash; which gets the size by placing keys in scalar context. The scalar function may be useful if you want to do that on the fly in list context, like, print scalar keys %hash; As for the rest, hashes generally use a lot of memory for the size of data stored. Think twice about turning a bunch of them loose on a potentially large slurpy pile of data.

      If a scheduled job exits after each run, its memory is returned to the system, so repetitions don't crank up usage. That applies to forked jobs as well as cron scheduling.

      After Compline,
      Zaxo

    Re: Memory /speed questions
    by bobn (Chaplain) on Jul 30, 2003 at 04:33 UTC

      Someone has said "Premature optimization is the root of all evil" or something like that. So I try not worry about stuff like this until I have to. My time and user's time is much more expensive than effort on the computer's part, so at some point, good enough really is good enough.

      The other answer is "Try it and see". WIth Linux or *BSD, most folks can afford at least a minimal developement environment - and if you're looking for things that cause you to hit limits, small can actaully be better. Start a couple dozen processes doing what you propose and watch the machine in top or vmstat or some such.

      --Bob Niederman, http://bob-n.com

        If the program is short lived and/or runs on a box dedicated to it, then ignoring memory consumption and cpu utilisation is fine.

        However, the sulphuric mentions " 5 hashes on a single page", which I would take to mean he is working in a webserver environment. Excessive memory and cpu utilisation can be can be disasterous in environments where the number of copies if the process can be large. Especially so when the number is controlled by external forces.

        Whilst hardware is cheap, for individuals as well as many companies that rely upon ISP or hosters for their hardware, the cost of purchasing (the use of) large enough boxes to handle peak loads, that sits idle 90% of the time is prohibitive. One only has to look to the slugishness of this site a few month ago before Pair kindly donated addition hardware resourses, to see that it isn't always a simple case of economics.

        Taking elementary steps to avoid wasting resources is far from "premature optimisation". Even when writing simple apps, understanding what costs and what doesn't is not "evil", more common sense.

        There is a compromise between bloat and unmaintainable, over-optimised code. The key to finding that compromise is understanding. Branding every request that mentions "efficient", "fast" or "use less" as premature optimisation is to deny that there is a problem. The barrier to understanding is the denial of the problem and the possibility of solutions.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

          I didn't say what you say I said.

          If the page is used twice a day by one person, it still doesn't matter. I've got pages that are used by 1 or 2 people twice a day tops, so I don't care how they work, beyond that they are correct.

          If he's actually looking at a situation with many hits, then it starts to matter - but I also said 'try it out and see'.

          --Bob Niederman, http://bob-n.com
        And if you really want to fine-tune your memory usage at the Perl level, there's Dan Sugalski's Devel::Size of course.

        Liz

    Re: Memory /speed questions
    by TomDLux (Vicar) on Jul 30, 2003 at 17:28 UTC

      What Abigail said.

      But also, consider the relative cost of your various questions. If you connect to a URL, whether once, or every five minutes, who cares which method of counting key/value pairs is faster? Connecting to a URL will take a significant number of milli-seconds or even seconds. Any method of counting pairs will take microseconds, unless your data set is unusually large.

      I guess that's another way of saying that if you profiled your program, you would find what questions were significant to consider.

      --
      TTTATCGGTCGTTATATAGATGTTTGCA