Hullo
I have a script that's nothing but a lot of simple =~ s///s. It's a redirect script for Squid; customers point their domain to our machine, and that box then forwards that request on to whatever one of our servers should handle it. So, the script looks something like:
while ( <> ) { ... elsif ( s|http://www.theirsite.com/\W|http://our.server1/theirsite/\n| +i ){ } elsif ( s|http://www.theirsite.com/|http://our.server1/theirsite/|i ) +{ } elsif ( s|http://www.dummy.com/\W|http://our.server2/dummy/\n|i ) { } # .. ad nauseam }
A couple of questions. I'd love some ideas for how to make this work more straightforwardly, especially defining the rules more clearly than a long list of regexes. Maybe it could use a text config file which expresses the simple, similar regexes that should be compiled at start up?
Secondly, since only one rule gets applied to each incoming URL, the most frequently used rules (which we can test against the logs) should go near the top, and the others near the bottom. However, it's a royal PITA to test and develop this. Any ideas on how to benchmark this painlessly, or a better algorithm - perhaps something B-Tree-ish - to order the rules?
thanks
ViceRaid
Update: Clarified as per diotalevi's nit
In reply to Organising lots of simple regexes by ViceRaid
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |