http://qs1969.pair.com?node_id=200678

Fellow Monks,

Perl memory requirements have been subject to several discussion already, nevertheless I'd like to reiterate parts of some little discussions I heard at YAPC::Munich:

Some people (myself included), argued that Perl's memory requirements are too leechy. This is a well known issue and in the past the only answer to this was "Well - it's a design decision. Just live with it."

Those, that cannot live any good with it, got interesting support by the talk of Nicholas Clark When perl is not quite fast enough. On a side note, he mentioned that over the past few years, the speed of processors has increased way more than the speed (bandwidth) of memory.

He is absolutedly right with this, as he is with his assumption, that this trend will continue. Given that, a short discussion started, where the audience quickly realized, that the design decision "Trading Memory for Speed" may very well backfire - if it hasn't done so already.

Another talk - Perl 5.10 by Hugo van der Sanden, also touched this issue. Hugos point for this was to have some runtime options for requesting less memory usage by trading some speed for it. Hugo refused to have compile time options to make a less memory consuming perl compiler for there would be more code to maintain.

My own considerations which started because of the special requirements by our AI applications, led me to some quantification of the tradeoff we now have to make:

Our machines are 32bit and because of cost there will be no big iron (64bit and more RAM than 32bit can handle) used for our development. So maximum memory (reasonable speed - not segmented) is 4GB. Unfortunatedly this is not enough for the perl application, so we have to tie hashes to files which are located on High-Performance RAID IO-Subsystems, but even so the IO is by a factor of 60 slower than memory IO and Bandwidth.

Given a Perl, that executes Hash-Lookups by a factor of 10 slower and using only 20 to 30% overhead (not 1000% as of now), we could get everything to memory and still would run by the factor of 6 faster than now.

Memoize this for the next design-decision...

Update: Better specified what I mean with "big iron"

Bye
 PetaMem

Replies are listed 'Best First'.
Re: STOP Trading Memory for Speed
by Elian (Parson) on Sep 25, 2002 at 18:34 UTC
    Are you in a position to throw any development resources towards perl itself? Hugo's concern about maintainability is definitely warranted, as there's a limited amount of engineering resource available to perl, and past decisions have definitely made an impact on what's easy and what's hard with the current code base. The best way to make sure what you want to have happen actually happens is to help make it happen.

    Update: This isn't meant to be snide or anything--many people are unaware of how much time and effort adding even seemingly simple features to a large and complex project like perl, nor how much time you have to spend afterwards maintaining that code...

Re: STOP Trading Memory for Speed
by perrin (Chancellor) on Sep 25, 2002 at 19:02 UTC
    Have you tried using the latest BerkeleyDB with an in-memory only database? It may be fast enough and small enough for your needs. If that doesn't work, you could consider using a C hash library and writing a thin Perl front-end to it (maybe with Inline::C), which would give you total control of the tradeoff in speed vs. memory for your application.
      One other thought: have you tried just letting it swap? Depending on your algorithm, the VM system might do a great job of efficiently keeping the part you care about in RAM while swapping the rest to disk. That was my experience with a very large data set on FreeBSD. We didn't even realize it had started swapping because it slowed down so little.
        You didn't read his problem.

        The memory limit is not physical memory, it is the fact that you can only address 4 GB with a 32-bit pointer.

        BerkeleyDB might solve this by taking less memory. It might also be able to use anonymous paging to work even though it cannot directly address most of that memory. The documentation says that data sets that large usually speed up by switching from hashing to a BTREE.

        But letting it swap won't work.

Re: STOP Trading Memory for Speed
by diotalevi (Canon) on Sep 25, 2002 at 18:06 UTC

    This is also highly relevant for those of us that have convinced our ISPs to run a mod_perl process but want to keep server memory utilization lower so it isn't such an admin headache. At the lower end of the spectrum anyway ;-)

      It doesn't actually reduce memory useage, but Apache::SizeLimit can keep your processes under control so that you don't have to have an admin watch them.
Re: STOP Trading Memory for Speed
by Aristotle (Chancellor) on Sep 25, 2002 at 20:38 UTC

    There's a flaw in your argumentation.

    1. Your assertion is: CPU speed is increasing faster than memory bandwidth.
    2. Your example shows: IO speed is an order of magnitude lower than memory bandwidth.
    3. But by all we know at this point, if your dataset was smaller or your memory larger the 1000% overhead approach would beat a 30% overhead one.

    Point 3 refutes point 1, with point 2 bearing no relevance to either. All you have proven is:

    You can trade memory to gain speed - so long as you have enough memory.

    That, I believe, isn't news to anyone. The fact that the 1000% overhead forced you to go for disk IO rather than all-in-memory is irrelevant because it doesn't have any impact on the relation between CPU speed and memory bandwidth.

    You do have a point in complaining about the high overhead approach forcing you to take refuge in disk IO, but that's an entirely different argument from saying that trading memory for clock cycles will backfire when an application fits in memory.

    Which of these two different points do you want to discuss?

    Makeshifts last the longest.

      You're missing some rather fundamental points here. On some CPU architectures you can see stall times of up to 200 cycles when accessing memory that isn't in L1 or L2 cache, and with large data sets this can be a significant overhead. Significant enough that in some cases it's actually faster to have a more compact representation in memory (and thus a higher density of real data in cache) with extra code to decode that data than it is to have simple code with relatively loosely packed data.

      For example, while it's much more efficient on most RISC architectures to use 32 bit or 64 bit data elements over 8 bit ones (as you generally have a mask and shift operation internally, or often externally), if using only 8 bit data means a larger portion of your data set fits in cache with fewer spills to main memory (or even L2 cache) the increased number of cycles actually used to decode your data is smaller than the number of cycles you pause waiting on memory busses.

        I've done a fair bit of assembly programming and am perfectly aware of issues of cache locality, but that issue alone doesn't make his example any more relevant.

        There's also the opposite case where the inner loops don't fit into L1, which turns out an order of magnitude more costly than fetching data from the slower memory tiers. It's harder to hit that limit of course, but as most CPUs have much smaller caches for code than data it's a real possibility nevertheless, considering the appeal is that we should routinely trade speed for data size.

        It's rarely quite as easy as "stop trading memory for speed!" or "lookup tables make things faster". Top performance, if at all needed in the first place, cannot be achieved any other way than by benchmarking using a low-level profiler to find out exactly where the bottlenecks are. And several bottlenecks usually interdepend so you often have to be careful not to make an "improvement" that actually worsens the overall situation.

        Makeshifts last the longest.

      Hi,

      nice logical analysis and argumentation, unfortunatedly not quite correct, as the "two different points" are entangled, thus they have to be discussed together.

      Let me try again:

      There is a gap between memory speed and CPU speed. This gap has grown over the past and probably will continue to grow - or let's even say it'll remain constant. It certainly will not lower or get inverse as long as we will have traditional von-Neuman machines instead of systolic fields or whatever.

      Therefore: Yes, there are algorithms that gain speed by trading memory for it even if the memory is slower than the processing element. But our computers are not theoretical turing machines with infinite ressources there are hard limits. One of these hard limits is direct 4GB adressing for 32bit architectures.

      If you hit this limit, your design decisions get a severe limitation in dimensionality. Often you have to rely on a implementation using the "remaining features" of your hardware. And in our case this is using the (relatively) enormous amounts of disk space compared with RAM space.

      So: IF the gap between memory and CPU speed will continue to grow (the most likely scenario), trading memory for clock cycles may very well backfire. Admittedly this is a long term view and should only be seen as an auxiliary argument for re-thinking of perls memory usage design philosophy (or dogma?).

      The more important argument is the real world situation: By trading very (too) much memory for clock cycles you hit the hard limits sooner, which forces you to use features of todays hardware that are inferior to other features you cannot use because of the hard limit.

      So my whole argument is: IMHO the design decision "Memory for Speed" is too weighted into one direction only. Perl would gain a big leap in flexibility if a pragma (or whatever) could state something like: "use LessMem;" every User could decide if the resulting gain in available memory and loose in speed is ok for his applications. But those of us that hit the hard limits of todays hardware could live just as long with such a perl until a 64bit 40GB RAM machine gets custom hardware. And even then some might wish not to hit THAT hard limit...

      As for a remark in this thread about the development ressources we'd be willing to invest into this. Well - we have no knowledge of the perl internas but this problem is critical to our application, so yes - we'll probably try some inline C, C has lib, and if all of these won't help - we'll dig into perlguts. :-)

      Bye
       PetaMem

        You say:
        Yes, there are algorithms that gain speed by trading memory for it even if the memory is slower than the processing element. But our computers are not theoretical turing machines with infinite ressources there are hard limits. One of these hard limits is direct 4GB adressing for 32bit architectures.
        I said:
        You do have a point in complaining about the high overhead approach forcing you to take refuge in disk IO, but that's an entirely different argument from saying that trading memory for clock cycles will backfire when an application fits in memory.

        You're not saying anything I hadn't already covered. Yes, your points are correct, but you made the same mistake as the original poster of applying the "too much memory forces me onto disk" argument to the "cache misses are getting more and more costly" statement.

        These have nothing to do with each other. To argue in favour of the latter, you have to show an example where trading code size for data size actually accelerated an application that had not hit the hard limit to begin with.

        Note I did not comment on the "cache misses are getting more and more costly" statement. Although I don't want to follow the suggestion of trading code size for data size - not only because I believe that the disparity is not large enough to justify a trade yet in most cases but mainly because I frequently find data intensive designs much easier to maintain than code intensive ones -, I know that that statement is true and will at some point in the future require attention.

        Makeshifts last the longest.

        Perl would gain a big leap in flexibility if a pragma (or whatever) could state something like: "use LessMem;"

        The perl porters are aware of this, as the documentation for the less pragma shows:

        NAME less - perl pragma to request less of something from the compiler SYNOPSIS use less; # unimplemented DESCRIPTION Currently unimplemented, this may someday be a compiler directive to make certain trade-offs, such as perhaps use less 'memory'; use less 'CPU'; use less 'fat';

        Ofcourse, as it is unimplemented, it is still useless in practice, but you could try convincing the perl 5 porters that it should at least do something in 5.10.

        -- Joost downtime n. The period during which a system is error-free and immune from user input.
Re: STOP Trading Memory for Speed
by jepri (Parson) on Sep 26, 2002 at 00:57 UTC
    After reading all these responses it occurs to me that you are overlooking one other thing - what happens when your dataset grows again? You might be able to reduce perls memory use to 50%, 25% or 1% of it's current bloat, but what happens when your dataset increases again? There's a real limit on how much overhead you can whittle away.

    Why not shift your data storage off to a cluster? You could do this in many ways. There are distributed database products, or you could roll your own (it is very easy).

    There are also various types of clustering software, which may or may not be appropriate depending on exactly how you are processing your data.

    Finally keep in mind that gigabit EtherNet is probably faster than even your RAID arrays transfer capability. A couple of computers with 1 Gb of memory each linked up with gigabit network cards will thrash almost any disc array, and still be cheaper than that 64-bit ultrasparc

    My favourite solution always was 'just add another computer'.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Re: STOP Trading Memory for Speed
by Lexicon (Chaplain) on Sep 25, 2002 at 22:06 UTC
    Thanks for your editorialized title, but you're overhyping. In my Mathematics::Combinatorics::Combinator, the memory/speed trade off was one of those lifetime-of-the-universe sort of time differences. The CPU/Memory speed ratio isn't likely to ever get that big. Hence, the memory/speed trick will always be useful tool.

    It is almost assuredly less work for you to write your own memory management than for Perl to change whatever you feel is necessary to fix your problem. If you want more control and speed, use C. And who knows what will happen with Perl 6 anyway.

Re: STOP Trading Memory for Speed
by BrowserUk (Patriarch) on Sep 25, 2002 at 23:56 UTC

    With at least a concept of the structure of your application, it's difficult to see what alternatives there are to your problem, other than the one which you can't have. Maybe that's the point, but I was party to a solution to a similar problem of max'd out memory forcing a partially disc-based dataset which in resulted in an unacceptable reduction in performance.

    The solution we arrived at was to partition both the dataset and the algorithms and run it in two halves on two seperate machines. Communication was done through a dedicated 100Mb/s FDDI link. There was talk of moving to a 155Mb/s ATM link, but this proved unnecessary.

    It was initially thought that putting half the data and half the code on each box was the way forward, but in the end, the codeset was almost identical and actually gained some weight due to the (surprisingly small) extra to handle the communications.

    The simple act of partitioning the dataset in two freed enough memory that the additional code was handled without penalty.

    Just a possibility.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
Re: STOP Trading Memory for Speed
by rob_au (Abbot) on Sep 26, 2002 at 00:55 UTC
Re: STOP Trading Memory for Speed
by Abigail-II (Bishop) on Oct 02, 2002 at 11:07 UTC
    Well, I read the article (I wasn't at YAPC::Munich), and there it's just a side note. Nothing as dramatical as you sketch it here.

    Our machines are 32bit and because of cost there will be no big iron (64bit) used for our development.
    Since when is 64 bits "big iron"? With "big iron", people usually mean mainframes. But for instance Sun hardware has been 64 bits for years - even their cheapest desktop (which used to be less than $1000 - I don't know if that's still the price). "All the world is Intel" is no more true than "all the world is a VAX" used to be.

    I'm willing to bet that 64bit hardware will be common way before RAM is so much faster than memory that trading "memory for speed" isn't worthwhile anymore.

    One should also realize that the trade "memory for speed" also means you gain a lot of flexibility. If we hadn't the overhead we have for values, we couldn't easily implement functions like "push", ".=", or length in (average) constant time - it would suddenly be linear in the size of the data. It also mean symbolic references wouldn't work the same way they do now. You'd end up with something that wouldn't be Perl anymore.

    Abigail

      When do you bet that 64-bit hardware will be common?

      16-bit hardware and paging extensions lasted from the Commodore to the 386 Intel's PAE hack and lack of a visible 64-bit consumer strategy (Itanium is not a consumer strategy) suggest that they will try to put off the 64-bit migration in the same way. Given the performance and space hits you get with 64-bits, it might win.

      As for memory for speed, if the amount of pre-buffering assigned to arrays, hash fill for a split of hashes, and similar parameters were tunable at runtime, then less could be made to work. Perl would still trade memory for performance, but how much could be adjusted significantly.

      Oh - come on Abigail,

      64bits from Sun @ $1000. You're joking. Not even the recent 32bit linux systems from Sun cost less than 1500,- US$ with 256MB RAM. They're PC's with the Sun label.

      And I say "big iron", because the memory overhead in perl is much worse for 64bit architectures than it is for 32bit, so to get everything to memory you get on a 32bit/4GB config, you need an up to 64bit/12GB configuration. YES factor of 3. So to double your capacity with the help of 64bit, you need to buy BIG iron with at least 24GB. Try it. And try to get this for something less than 50.000US$

      But I'm happy, that most of the replies (and current considerations for 5.10) give more focus to this problem. So the most realistic scenario for us will be, that solutions for this problem will come from 3 directions:

      • We are designing the architecture of the application partitionable, so it can be distributed over n nodes of a cluster.
      • Some work on the perl side will surely be done to lower Perls memory overhead (thanks for the use less pragma pointer).
      • maybe in 3-5 years we'll really have 64bit with xx GB for a few bucks and will build clusters with these.
      If all that is going to work out that way, I'll be a happy man. :-)

      Bye
       PetaMem