There are a number of considerations here. First, to your point of debugging, yes, multithreading is generally much harder to debug. I had a conversation with a monk many years ago on this topic where he was advocating perl threads for something and using a shared variable to send back all the information, and I smelled some male-bovine manure, so I tested it out, and showed that it was, in fact, problematic. He doubled down, and eventually I gave up having proven that multi-threading is harder than it appears. As I recall, he's no longer a member, but then again, mostly neither am I :)
So this game will be a vue-based web front end with a REST API backend, which is similar to the original (although vue will be an upgrade from what it was as well). What the perl backend would do, then, is, with each request, nginx would proxy it to one of two servers, and both servers were running the same code (horizontal scaling!) which would have a process pool to pick up the request and run with it. But each process would make calls to the db, both for reading and writing, as well as to memcache, etc. Which would mean the process would sit there idle for large stretches of time. This isn't hugely horrible, the kernel will see it's sleeping and move on. But processes are significantly more overhead than threads for the kernel to track and manage (though less significant on linux than windows), and both are significantly more overhead than coroutines at this. So a single thread will be able to send multiple requests to various sources (postgres, redis, etc.) and will only need actual threads for computation, which there likely won't be that much of most of the time (the main resource-processing loop might be an exception, but even that likely isn't much). Handling dozens of simultaneous requests on a single thread should be possible with proper coroutine support.
There is also the idea of adding websockets into the mix, and that, too, should be doable with few, if any, additional threads, as most of the time those coroutines will be dormant.
As to checkpointing, that's a bit further than I usually go with coroutines, but I might have to go there with websockets to shut them down cleanly. However, even then, there is a way to tell the server to save all the state and exit, more than one way, really, and in C#, that way is a CancellationToken. Once the cancellation is received, do what you need to cancel things, which could be to save things, though usually it's simply to throw an exception to back out of the stack depth. Mostly this isn't an issue because everything happens in the database, which is a requirement for horizontal scaling anyway.
My theory is that the workload that was being done via perl on two VM servers could be handled entirely trivially on a single, smaller VM with coroutines. My day job involves more or less a similar scenario, with insufficiently optimised computation in many places, with likely an order of magnitude more simultaneous users than I could dream of for my project, which leads me to conclude that there was something wrong with the original setup, and I believe (could be wrong) that this is the cause.
The new model will likely also involve nginx as a proxy to a single backend server (so we can eventually horizontally scale, but I highly doubt it'll ever be necessary) but also serve the vue code and static assets directly itself. Both of these will now live on the same server. Redis and a scheduler/discord bot will also live on that server (though not listening on any public ports). Horizontal scaling will be a bit of a challenge to maintain security, but should be fine. Postgres will live on a second server. This is compared to the original system using 5 servers.
| [reply] |
Ok, so it's basically the standard web-worker model. If you haven't yet, I would suggest looking at Mojolicious. It has an amazingly convenient system for websockets, and is built around an event loop. When paired with Mojo::Pg, you can implement an entire web worker with non-blocking calls very conveniently. It's not quite as convenient as async/await keywords, but the way it works the callbacks into "events of objects" is almost as nice.
I did a review of all the async solutions for websockets a few years ago in my YAPC talk "The Wide World of Websockets". I wasn't using a database for any of those, but implemented a simple chat server in each perl library and Mojo seemed like the clear winner. Meanwhile, the multiplayer Asteroids demo is still live on https://nrdvana.net/asteroids. (it's only half-implemented and a little buggy, but shows the possibilities pretty well. Click 'reconnect' however many times it takes...)
| [reply] |
I can't recall for sure, but I suspect I looked at Mojo for the first time five or six years ago long enough to write the test that Ovid uses for his consulting company (he didn't like some of my design decisions, I disagreed, but it is his prerogative), and not again since.
However, there are still some fundamental problems with perl asynchronicity. That is, any XS driver not aware of it will have to be managed somehow. Coro (which I used for the CB stats, the original topic) did this by shunting them off to separate (real) threads, I believe. I'd have to look at the code, but somehow I think now that the code I was using just did a hard wait on the db, which is kind of horrible, even worse than consuming extra threads.
As for Mojo::Pg, I would be somewhat surprised if DBD::Pg wasn't using threads in the background. Ideally it'd be just one thread for all of the aysnc queries, so hopefully that's the solution here. However, I would be even more surprised if most of the DB drivers support this async behaviour. Yes, I've chosen Postgres for this application, but at this point I could pretty easily change my mind. At the end of the day, some drivers will not see the async as required by DBI, and just not implement. Also, this game originally was based on DBIx::Class, and maybe this has changed, but I don't recall it supporting the async version of mysql or postgres. (This is what I mean by the whole stack needing to support the async nature.)
And then "callbacks" is exactly the problem that async/await is supposed to solve (with javascript almost explicitly being that - to avoid Promises with callbacks inside Promises of callbacks inside ... just use async/await). But this is a bit of a digression - the point really comes down to
- Co-routines are way better than threads. Event loops are related - similar performance, though considerably more annoying to use. A single coroutine thread can handle dozens, if not hundreds, of actions with less overhead (compared to threads), as long as it's mostly I/O driven and can be waited upon by the main event loop. I've managed parallel processing across clusters of 40+ AIX systems using ssh under AnyEvent, and the management node was essentially idle, when coworkers were writing their code to do one node at a time. I was a huge fan. But there are limits.
- Languages where coroutines are native are just going to give every driver the incentive to do things in that one, single, standard way (which is what AnyEvent was supposed to provide, but, due to certain factors I won't get into, was largely rejected by the community at large), and there is a huge advantage there.
- Combining coroutines and real threads, using threadpools for the coroutines, and the amount of work you can get done trivially is immense.
Believe me, I'd avoid anything from Microsoft if I could, but they've produced a language and virtual machine here that actually is a really good balance between competing requirements. The only thing I'm really missing at this point is the type-and-run aspect of perl, although I could do with elevating regexes to first-class language constructs like in Perl as well :) Things like perl -ML -E 'L->do_something(shift)' 12345 was really handy. I can still share libraries easily enough, but not THAT easy.
| [reply] [d/l] |