Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: STOP Trading Memory for Speed

by Aristotle (Chancellor)
on Sep 25, 2002 at 20:38 UTC ( [id://200727]=note: print w/replies, xml ) Need Help??


in reply to STOP Trading Memory for Speed

There's a flaw in your argumentation.

  1. Your assertion is: CPU speed is increasing faster than memory bandwidth.
  2. Your example shows: IO speed is an order of magnitude lower than memory bandwidth.
  3. But by all we know at this point, if your dataset was smaller or your memory larger the 1000% overhead approach would beat a 30% overhead one.

Point 3 refutes point 1, with point 2 bearing no relevance to either. All you have proven is:

You can trade memory to gain speed - so long as you have enough memory.

That, I believe, isn't news to anyone. The fact that the 1000% overhead forced you to go for disk IO rather than all-in-memory is irrelevant because it doesn't have any impact on the relation between CPU speed and memory bandwidth.

You do have a point in complaining about the high overhead approach forcing you to take refuge in disk IO, but that's an entirely different argument from saying that trading memory for clock cycles will backfire when an application fits in memory.

Which of these two different points do you want to discuss?

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: STOP Trading Memory for Speed
by Elian (Parson) on Sep 25, 2002 at 23:16 UTC
    You're missing some rather fundamental points here. On some CPU architectures you can see stall times of up to 200 cycles when accessing memory that isn't in L1 or L2 cache, and with large data sets this can be a significant overhead. Significant enough that in some cases it's actually faster to have a more compact representation in memory (and thus a higher density of real data in cache) with extra code to decode that data than it is to have simple code with relatively loosely packed data.

    For example, while it's much more efficient on most RISC architectures to use 32 bit or 64 bit data elements over 8 bit ones (as you generally have a mask and shift operation internally, or often externally), if using only 8 bit data means a larger portion of your data set fits in cache with fewer spills to main memory (or even L2 cache) the increased number of cycles actually used to decode your data is smaller than the number of cycles you pause waiting on memory busses.

      I've done a fair bit of assembly programming and am perfectly aware of issues of cache locality, but that issue alone doesn't make his example any more relevant.

      There's also the opposite case where the inner loops don't fit into L1, which turns out an order of magnitude more costly than fetching data from the slower memory tiers. It's harder to hit that limit of course, but as most CPUs have much smaller caches for code than data it's a real possibility nevertheless, considering the appeal is that we should routinely trade speed for data size.

      It's rarely quite as easy as "stop trading memory for speed!" or "lookup tables make things faster". Top performance, if at all needed in the first place, cannot be achieved any other way than by benchmarking using a low-level profiler to find out exactly where the bottlenecks are. And several bottlenecks usually interdepend so you often have to be careful not to make an "improvement" that actually worsens the overall situation.

      Makeshifts last the longest.

Re: Re: STOP Trading Memory for Speed
by PetaMem (Priest) on Oct 02, 2002 at 10:25 UTC
    Hi,

    nice logical analysis and argumentation, unfortunatedly not quite correct, as the "two different points" are entangled, thus they have to be discussed together.

    Let me try again:

    There is a gap between memory speed and CPU speed. This gap has grown over the past and probably will continue to grow - or let's even say it'll remain constant. It certainly will not lower or get inverse as long as we will have traditional von-Neuman machines instead of systolic fields or whatever.

    Therefore: Yes, there are algorithms that gain speed by trading memory for it even if the memory is slower than the processing element. But our computers are not theoretical turing machines with infinite ressources there are hard limits. One of these hard limits is direct 4GB adressing for 32bit architectures.

    If you hit this limit, your design decisions get a severe limitation in dimensionality. Often you have to rely on a implementation using the "remaining features" of your hardware. And in our case this is using the (relatively) enormous amounts of disk space compared with RAM space.

    So: IF the gap between memory and CPU speed will continue to grow (the most likely scenario), trading memory for clock cycles may very well backfire. Admittedly this is a long term view and should only be seen as an auxiliary argument for re-thinking of perls memory usage design philosophy (or dogma?).

    The more important argument is the real world situation: By trading very (too) much memory for clock cycles you hit the hard limits sooner, which forces you to use features of todays hardware that are inferior to other features you cannot use because of the hard limit.

    So my whole argument is: IMHO the design decision "Memory for Speed" is too weighted into one direction only. Perl would gain a big leap in flexibility if a pragma (or whatever) could state something like: "use LessMem;" every User could decide if the resulting gain in available memory and loose in speed is ok for his applications. But those of us that hit the hard limits of todays hardware could live just as long with such a perl until a 64bit 40GB RAM machine gets custom hardware. And even then some might wish not to hit THAT hard limit...

    As for a remark in this thread about the development ressources we'd be willing to invest into this. Well - we have no knowledge of the perl internas but this problem is critical to our application, so yes - we'll probably try some inline C, C has lib, and if all of these won't help - we'll dig into perlguts. :-)

    Bye
     PetaMem

      You say:
      Yes, there are algorithms that gain speed by trading memory for it even if the memory is slower than the processing element. But our computers are not theoretical turing machines with infinite ressources there are hard limits. One of these hard limits is direct 4GB adressing for 32bit architectures.
      I said:
      You do have a point in complaining about the high overhead approach forcing you to take refuge in disk IO, but that's an entirely different argument from saying that trading memory for clock cycles will backfire when an application fits in memory.

      You're not saying anything I hadn't already covered. Yes, your points are correct, but you made the same mistake as the original poster of applying the "too much memory forces me onto disk" argument to the "cache misses are getting more and more costly" statement.

      These have nothing to do with each other. To argue in favour of the latter, you have to show an example where trading code size for data size actually accelerated an application that had not hit the hard limit to begin with.

      Note I did not comment on the "cache misses are getting more and more costly" statement. Although I don't want to follow the suggestion of trading code size for data size - not only because I believe that the disparity is not large enough to justify a trade yet in most cases but mainly because I frequently find data intensive designs much easier to maintain than code intensive ones -, I know that that statement is true and will at some point in the future require attention.

      Makeshifts last the longest.

      Perl would gain a big leap in flexibility if a pragma (or whatever) could state something like: "use LessMem;"

      The perl porters are aware of this, as the documentation for the less pragma shows:

      NAME less - perl pragma to request less of something from the compiler SYNOPSIS use less; # unimplemented DESCRIPTION Currently unimplemented, this may someday be a compiler directive to make certain trade-offs, such as perhaps use less 'memory'; use less 'CPU'; use less 'fat';

      Ofcourse, as it is unimplemented, it is still useless in practice, but you could try convincing the perl 5 porters that it should at least do something in 5.10.

      -- Joost downtime n. The period during which a system is error-free and immune from user input.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://200727]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-26 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found