http://qs1969.pair.com?node_id=1112666


in reply to HELP: Does anyone here routinely, deliberately, overcommit their servers? Or know anyone who does? Or heard tell of same? Anyone?
in thread RAM: It isn't free . . .

The working-set sizes are consistently smaller than the virtual-memory allotment that the applications need to be able to request.
You live in a very peculiar world...

I observe that it is extremely rare for the part that I quoted to not be the case. It is far from "peculiar", in my experience.

But I would agree that cases of hosts being operated in manners that make heavy use of this fact used to be the rule but now is more likely to be considered something to carefully avoid (even obsessively avoid), especially in Production hosts.

But I read that making heavy use of this type of over-commitment of memory (virtual memory allotment vs. working set size) is the rule in cloud architectures. But, especially in the cloud (or other systems of many virtual machines), the term "over-commit" is usually resorted to when the accumulated (desired) working sets exceed the total physical memory used, rather than when the total virtual memory allocations exceed the total physical memory.

But that isn't always the case. We try to discourage even strict over-commit in our Production VM racks. In some ways, this can be more important than it was for racks of separate physical hosts.

A large collection of virtual machines working together as the Production environment for some organization should be carefully built so that each VM has resources allocated to it sufficient to handle not just the peak load, but some percentage more than that so that an unprecedented peak in traffic or (more likely) some problem that increases load happening during peak is very unlikely to overload the system, leaving the system at least able to shed / refuse load enough that the requests that aren't rejected complete satisfactorily while the rest "fail fast" (after a reasonable number of retries, probably at several layers), hopefully being registered for another retry significantly later.

But those problems at peak often cause many virtual machines to use significantly more resources than they normally do at peak. If your VMs have not been configured with low enough hard limits on resources (memory size, number/speed of CPU cores) or your VM hosting platform is over-committed vs. the sum of those hard limits (adding in I/O throughput, a resource which I don't think VMs do a good job of partitioning), then you risk the whole VM platform becoming significantly degraded. Now your 20 web servers having the problem and the 30 related servers suffering from knock-on effects are also impacting many of the servers you need to use to actually fix the problem, more than they would if you used separate physical hosts.

Our Production VM hosting platforms have the majority of their resources idle, even at peak.

While a real cloud has enough diversity that a significant percentage of the virtual machines going into overload at the same time is so unlikely that the expense of strictly avoiding over-commit would make the cloud's pricing not competitive.

- tye        

  • Comment on Re^3: RAM: It isn't free . . . (over-commit)

Replies are listed 'Best First'.
Re^4: RAM: It isn't free . . . (over-commit allocation -v- actual use)
by BrowserUk (Patriarch) on Jan 08, 2015 at 18:46 UTC
    But I read that making heavy use of this type of over-commitment of memory (virtual memory allotment vs. working set size) is the rule in cloud architectures.

    I think that there is a clear distinction between overcommitted allocation and overcommitted memory use.

    Whilst I agree that it is quite normal for the total, of the memory allocations of all the VMs running on an individual cloud server, to exceed the physical memory of that server; that does not mean that the physical server is overcommitted. Ie. It's fine for server with 4GB to run 4 VMs allocated 2GB each, so long as the sum of the actual usages doesn't exceed 4GB. That would only happen if two or more of those actually used their full allocations.

    But, AFAIK, that can't happen! At least not on any of the more modern hypervisors.

    Because any hypervisor worthy of its name will detect the attempt to overcommit the physical memory and move one or more of the VMs to a different server. (And equally, detect a machine with under utilised physical memory, despite being overcommitted VM allocation-wise, and attempt to ship in another VM or two to utilise it.)

    It's also possible to allocate a VM with more cores than the physical server possesses -- both Oracle VirtualBox and MS Virtual Server will allow me to specify that a VM is allocated 32 cores and run it on my 4 core commodity box; and I'm pretty sure (but don't actually know), that VMWare, Parallels and the other "big guys" in the field can do the same -- but it doesn't mean that those VMs actually get the benefit of those overcommited cores.

    (Indeed, the suffer from it because you have the guess OS carefully scheduling the VMs threads across the 32 cores, completely oblivious to the fact that the hypervisor is then mapping them back to the 4 actual cores. And whenever you have two schedulers try to manage the same resources; they fight with each other and that costs dear. Same problem with languages that use their own scheduler and green threads; the two fight and nobody wins. )


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
    A reply falls below the community's threshold of quality. You may see it by logging in.
    A reply falls below the community's threshold of quality. You may see it by logging in.