The route to improving performance is often a long and windy one. Sometimes it's possible to save by examining the boundaries. For example, are there any consistent features of the regexes that can be used to cut down the algorithm you end up with? Are the regexes being assembled of multiple regexes which are available separately (e.g. capturable as cgi parameters)? Generally if big wodges of data are coming my way, before turning the mill, I try to cut them into much smaller and more manageable pieces either size-wise (e.g. compare fixed-size chunks) or scope-wise, (e.g. using a lexer and parser).