You're indeed correct that the HTML download is the bottleneck, and I am doing a lot of work to optimize that (including parallel processing)
However, if I can make the regexes a bit faster it will also help, and to be honest I am also curious about the answer to this question