in reply to Why Coro?
If you attempt to run “3,000 threads,” let alone 12,000 ...
"You're dead, Jim ..." — Bones
The number of threads should be determined by the number of requests that you can actually process “at the same time” on your hardware. It should be a variable number, and it should be fairly small.
A small number of threads can very efficiently serve a large number of devices, on the presumption that “not every device will be sending data to us at the same instant.” There should be these thread pools:
You can easily see how this works, and how it will be easily tunable. We need to gather requests with an adequate level of latency, and to know if a device is dead, so that's what the first (and third) threads do. Then, we need to be sure that the threads can be processed effectively once received, without bottlenecks, and this is what the second group does. Because of the presence of the queue, nothing will get out of hand.
Also note that there are many CPAN packages which are already built to implement this sort of thing, because it is a very common scenario. (Heck, it dates all the way back to IBM's “CICS” product for the earliest mainframes.) Never do a thing that has already been done... it is very easy to find yourself doing exactly that.