Re: STOP Trading Memory for Speed
by Elian (Parson) on Sep 25, 2002 at 18:34 UTC
|
Are you in a position to throw any development resources towards perl itself? Hugo's concern about maintainability is definitely warranted, as there's a limited amount of engineering resource available to perl, and past decisions have definitely made an impact on what's easy and what's hard with the current code base. The best way to make sure what you want to have happen actually happens is to help make it happen.
Update: This isn't meant to be snide or anything--many people are unaware of how much time and effort adding even seemingly simple features to a large and complex project like perl, nor how much time you have to spend afterwards maintaining that code... | [reply] |
Re: STOP Trading Memory for Speed
by perrin (Chancellor) on Sep 25, 2002 at 19:02 UTC
|
Have you tried using the latest BerkeleyDB with an in-memory only database? It may be fast enough and small enough for your needs. If that doesn't work, you could consider using a C hash library and writing a thin Perl front-end to it (maybe with Inline::C), which would give you total control of the tradeoff in speed vs. memory for your application. | [reply] |
|
One other thought: have you tried just letting it swap? Depending on your algorithm, the VM system might do a great job of efficiently keeping the part you care about in RAM while swapping the rest to disk. That was my experience with a very large data set on FreeBSD. We didn't even realize it had started swapping because it slowed down so little.
| [reply] |
|
You didn't read his problem.
The memory limit is not physical memory, it is the fact that you can only address 4 GB with a 32-bit pointer.
BerkeleyDB might solve this by taking less memory. It might also be able to use anonymous paging to work even though it cannot directly address most of that memory. The documentation says that data sets that large usually speed up by switching from hashing to a BTREE.
But letting it swap won't work.
| [reply] |
|
|
|
Re: STOP Trading Memory for Speed
by diotalevi (Canon) on Sep 25, 2002 at 18:06 UTC
|
This is also highly relevant for those of us that have convinced our ISPs to run a mod_perl process but want to keep server memory utilization lower so it isn't such an admin headache. At the lower end of the spectrum anyway ;-)
| [reply] |
|
It doesn't actually reduce memory useage, but Apache::SizeLimit can keep your processes under control so that you don't have to have an admin watch them.
| [reply] |
Re: STOP Trading Memory for Speed
by jepri (Parson) on Sep 26, 2002 at 00:57 UTC
|
After reading all these responses it occurs to me that you are overlooking one other thing - what happens when your dataset grows again? You might be able to reduce perls memory use to 50%, 25% or 1% of it's current bloat, but what happens when your dataset increases again? There's a real limit on how much overhead you can whittle away.
Why not shift your data storage off to a cluster? You could do this in many ways. There are distributed database products, or you could roll your own (it is very easy).
There are also various types of clustering software, which may or may not be appropriate depending on exactly how you are processing your data.
Finally keep in mind that gigabit EtherNet is probably faster than even your RAID arrays transfer capability. A couple of computers with 1 Gb of memory each linked up with gigabit network cards will thrash almost any disc array, and still be cheaper than that 64-bit ultrasparc
My favourite solution always was 'just add another computer'.
____________________
Jeremy
I didn't believe in evil until I dated it. | [reply] |
Re: STOP Trading Memory for Speed
by Aristotle (Chancellor) on Sep 25, 2002 at 20:38 UTC
|
There's a flaw in your argumentation.
- Your assertion is: CPU speed is increasing faster than memory bandwidth.
- Your example shows: IO speed is an order of magnitude lower than memory bandwidth.
- But by all we know at this point, if your dataset was smaller or your memory larger the 1000% overhead approach would beat a 30% overhead one.
Point 3 refutes point 1, with point 2 bearing no relevance to either. All you have proven is:
You can trade memory to gain speed - so long as you have enough memory.
That, I believe, isn't news to anyone. The fact that the 1000% overhead forced you to go for disk IO rather than all-in-memory is irrelevant because it doesn't have any impact on the relation between CPU speed and memory bandwidth.
You do have a point in complaining about the high overhead approach forcing you to take refuge in disk IO, but that's an entirely different argument from saying that trading memory for clock cycles will backfire when an application fits in memory.
Which of these two different points do you want to discuss?
Makeshifts last the longest.
| [reply] |
|
You're missing some rather fundamental points here. On some CPU architectures you can see stall times of up to 200 cycles when accessing memory that isn't in L1 or L2 cache, and with large data sets this can be a significant overhead. Significant enough that in some cases it's actually faster to have a more compact representation in memory (and thus a higher density of real data in cache) with extra code to decode that data than it is to have simple code with relatively loosely packed data.
For example, while it's much more efficient on most RISC architectures to use 32 bit or 64 bit data elements over 8 bit ones (as you generally have a mask and shift operation internally, or often externally), if using only 8 bit data means a larger portion of your data set fits in cache with fewer spills to main memory (or even L2 cache) the increased number of cycles actually used to decode your data is smaller than the number of cycles you pause waiting on memory busses.
| [reply] |
|
I've done a fair bit of assembly programming and am perfectly aware of issues of cache locality, but that issue alone doesn't make his example any more relevant.
There's also the opposite case where the inner loops don't fit into L1, which turns out an order of magnitude more costly than fetching data from the slower memory tiers. It's harder to hit that limit of course, but as most CPUs have much smaller caches for code than data it's a real possibility nevertheless, considering the appeal is that we should routinely trade speed for data size.
It's rarely quite as easy as "stop trading memory for speed!" or "lookup tables make things faster". Top performance, if at all needed in the first place, cannot be achieved any other way than by benchmarking using a low-level profiler to find out exactly where the bottlenecks are. And several bottlenecks usually interdepend so you often have to be careful not to make an "improvement" that actually worsens the overall situation.
Makeshifts last the longest.
| [reply] |
|
Hi,
nice logical analysis and argumentation, unfortunatedly
not quite correct, as the "two different points"
are entangled, thus they have to be discussed together.
Let me try again:
There is a gap between memory speed and CPU speed. This
gap has grown over the past and probably will continue
to grow - or let's even say it'll remain constant. It
certainly will not lower or get inverse as long as we
will have traditional von-Neuman machines instead of
systolic fields or whatever.
Therefore: Yes, there are algorithms that gain speed by
trading memory for it even if the memory is slower than
the processing element. But our computers are not
theoretical turing machines with infinite ressources
there are hard limits. One of these hard limits is
direct 4GB adressing for 32bit architectures.
If you hit this limit, your design decisions get a
severe limitation in dimensionality. Often you have to
rely on a implementation using the "remaining features"
of your hardware. And in our case this is using
the (relatively) enormous amounts of disk space compared
with RAM space.
So: IF the gap between memory and CPU speed will continue
to grow (the most likely scenario), trading memory for
clock cycles may very well backfire. Admittedly this is
a long term view and should only be seen as an auxiliary
argument for re-thinking of perls memory usage design
philosophy (or dogma?).
The more important argument is the real world situation:
By trading very (too) much memory for clock cycles you
hit the hard limits sooner, which forces you to use
features of todays hardware that are inferior to other
features you cannot use because of the hard limit.
So my whole argument is: IMHO the design decision
"Memory for Speed" is too weighted into one direction
only. Perl would gain a big leap in flexibility if
a pragma (or whatever) could state something like:
"use LessMem;" every User could decide if the resulting
gain in available memory and loose in speed is ok for
his applications. But those of us that hit the hard
limits of todays hardware could live just as long with
such a perl until a 64bit 40GB RAM machine gets custom
hardware. And even then some might wish not to hit
THAT hard limit...
As for a remark in this thread about the development
ressources we'd be willing to invest into this. Well -
we have no knowledge of the perl internas but this
problem is critical to our application, so yes - we'll
probably try some inline C, C has lib, and if all of
these won't help - we'll dig into perlguts. :-)
Bye
PetaMem
| [reply] |
|
You say:
Yes, there are algorithms that gain speed by trading memory for it even if the memory is slower than the processing element. But our computers are not theoretical turing machines with infinite ressources there are hard limits. One of these hard limits is direct 4GB adressing for 32bit architectures.
I said:
You do have a point in complaining about the high overhead approach forcing you to take refuge in disk IO, but that's an entirely different argument from saying that trading memory for clock cycles will backfire when an application fits in memory.
You're not saying anything I hadn't already covered. Yes, your points are correct, but you made the same mistake as the original poster of applying the "too much memory forces me onto disk" argument to the "cache misses are getting more and more costly" statement.
These have nothing to do with each other. To argue in favour of the latter, you have to show an example where trading code size for data size actually accelerated an application that had not hit the hard limit to begin with.
Note I did not comment on the "cache misses are getting more and more costly" statement. Although I don't want to follow the suggestion of trading code size for data size - not only because I believe that the disparity is not large enough to justify a trade yet in most cases but mainly because I frequently find data intensive designs much easier to maintain than code intensive ones -, I know that that statement is true and will at some point in the future require attention.
Makeshifts last the longest.
| [reply] |
|
NAME
less - perl pragma to request less of something from the
compiler
SYNOPSIS
use less; # unimplemented
DESCRIPTION
Currently unimplemented, this may someday be a compiler
directive to make certain trade-offs, such as perhaps
use less 'memory';
use less 'CPU';
use less 'fat';
Ofcourse, as it is unimplemented, it is still useless in practice, but you could try convincing the perl 5 porters that it should at least do something in 5.10.
--
Joost downtime n. The period during which a system
is error-free and immune from user input.
| [reply] [d/l] [select] |
Re: STOP Trading Memory for Speed
by Lexicon (Chaplain) on Sep 25, 2002 at 22:06 UTC
|
Thanks for your editorialized title, but you're overhyping. In my Mathematics::Combinatorics::Combinator, the memory/speed trade off was one of those lifetime-of-the-universe sort of time differences. The CPU/Memory speed ratio isn't likely to ever get that big. Hence, the memory/speed trick will always be useful tool.
It is almost assuredly less work for you to write your own memory management than for Perl to change whatever you feel is necessary to fix your problem. If you want more control and speed, use C. And who knows what will happen with Perl 6 anyway.
| [reply] |
Re: STOP Trading Memory for Speed
by BrowserUk (Patriarch) on Sep 25, 2002 at 23:56 UTC
|
With at least a concept of the structure of your application, it's difficult to see what alternatives there are to your problem, other than the one which you can't have. Maybe that's the point, but I was party to a solution to a similar problem of max'd out memory forcing a partially disc-based dataset which in resulted in an unacceptable reduction in performance.
The solution we arrived at was to partition both the dataset and the algorithms and run it in two halves on two seperate machines. Communication was done through a dedicated 100Mb/s FDDI link. There was talk of moving to a 155Mb/s ATM link, but this proved unnecessary.
It was initially thought that putting half the data and half the code on each box was the way forward, but in the end, the codeset was almost identical and actually gained some weight due to the (surprisingly small) extra to handle the communications.
The simple act of partitioning the dataset in two freed enough memory that the additional code was handled without penalty.
Just a possibility.
Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
| [reply] [d/l] |
Re: STOP Trading Memory for Speed
by rob_au (Abbot) on Sep 26, 2002 at 00:55 UTC
|
| [reply] |
Re: STOP Trading Memory for Speed
by Abigail-II (Bishop) on Oct 02, 2002 at 11:07 UTC
|
Well, I read the article (I wasn't at YAPC::Munich), and there it's
just a side note. Nothing as dramatical as you sketch it here.
Our machines are 32bit and because of cost there will be no big iron
(64bit) used for our development.
Since when is 64 bits "big iron"? With "big iron", people usually
mean mainframes. But for instance Sun hardware has been 64 bits
for years - even their cheapest desktop (which used to be less than
$1000 - I don't know if that's still the price). "All the world is
Intel" is no more true than "all the world is a VAX" used to be.
I'm willing to bet that 64bit hardware will be common way before RAM
is so much faster than memory that trading "memory for speed" isn't
worthwhile anymore.
One should also realize that the trade "memory for speed" also means
you gain a lot of flexibility. If we hadn't the overhead we have for
values, we couldn't easily implement functions like "push", ".=",
or length in (average) constant time - it would suddenly be linear
in the size of the data. It also mean symbolic references wouldn't
work the same way they do now. You'd end up with something that
wouldn't be Perl anymore.
Abigail
| [reply] |
|
When do you bet that 64-bit hardware will be common?
16-bit hardware and paging extensions lasted from the Commodore to the 386 Intel's PAE hack and lack of a visible 64-bit consumer strategy (Itanium is not a consumer strategy) suggest that they will try to put off the 64-bit migration in the same way. Given the performance and space hits you get with 64-bits, it might win.
As for memory for speed, if the amount of pre-buffering assigned to arrays, hash fill for a split of hashes, and similar parameters were tunable at runtime, then less could be made to work. Perl would still trade memory for performance, but how much could be adjusted significantly.
| [reply] |
|
Oh - come on Abigail,
64bits from Sun @ $1000. You're joking. Not even the recent
32bit linux systems from Sun cost less than 1500,- US$ with
256MB RAM. They're PC's with the Sun label.
And I say "big iron", because the memory overhead in perl
is much worse for 64bit architectures than it is for 32bit,
so to get everything to memory you get on a 32bit/4GB
config, you need an up to 64bit/12GB configuration.
YES factor of 3. So to double your capacity with the
help of 64bit, you need to buy BIG iron with at least
24GB. Try it. And try to get this for something less than
50.000US$
But I'm happy, that most of the replies (and current
considerations for 5.10) give more focus to this problem.
So the most realistic scenario for us will be, that solutions for this problem will come from 3 directions:
- We are designing the architecture of the
application partitionable, so it can be distributed
over n nodes of a cluster.
- Some work on the perl side will surely be
done to lower Perls memory overhead (thanks for the
use less pragma pointer).
- maybe in 3-5 years we'll really have 64bit with xx GB
for a few bucks and will build clusters with these.
If all that is going to work out that way, I'll be a happy
man. :-)
Bye
PetaMem
| [reply] |
|
| [reply] |
|
|
|