I'm not sure I agree with you about the final join though. The OP just needs to keep the 1 to 5 "few k" chunks somewhere until he has the set. At that point he is going to transmit them.
He can just as easily write them as 5 seperate strings as 1, and avoid the need to concatenate them?
I think the most efficient way would be to pre-extend to the array to max_chanks, and then pass and store references to the chunks avoiding any need to copy or concatenate. In isolation from the code that generates the chunks, passes them to the buffering code, and finally transmits them, the whole thing is an academic exercise.
Even without my gaff, the benchmark was at best iffy.
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller