first of all my congratulations for finding out. That's very good and you're right, people tend to use components or structures without understanding them at all.
The only thing that seems suspicious to me s that you have to use an 8000-value key to query your cache. Are you sure this is the only way to go? maybe there is a shorter input query with less independent parametrs you could use as a cache key. You could have a rough-hit and then check afterwards whether the entry is appropriate or not.
Also, in exchanging against a complex and costly structure building process, you might probably do for storing the cached results on disk or in a database and then reading when needed. This means you don't use RAM for the thing and you could have a very simple cronjob running purging the stale entries cache.
From what I read, the process itself might be taking a couple of seconds, as you say it was 10x worse when sorting. I don't think reading a cached entry from disk should take averagely more than 0.2 seconds (and likely much less), so that's anyway a 10x improvement without the risks and hassles of ending physical RAM or finding yourself on swap space.
Just my $0.02