The only thing that seems suspicious to me s that you have to use an 8000-value key to query your cache. Are you sure this is the only way to go? maybe there is a shorter input query with less independent parametrs you could use as a cache key. You could have a rough-hit and then check afterwards whether the entry is appropriate or not.
Also, in exchanging against a complex and costly structure building process, you might probably do for storing the cached results on disk or in a database and then reading when needed. This means you don't use RAM for the thing and you could have a very simple cronjob running purging the stale entries cache.
From what I read, the process itself might be taking a couple of seconds, as you say it was 10x worse when sorting. I don't think reading a cached entry from disk should take averagely more than 0.2 seconds (and likely much less), so that's anyway a 10x improvement without the risks and hassles of ending physical RAM or finding yourself on swap space.
Just my $0.02