comment on

Hullo

I have a script that's nothing but a lot of simple =~ s///s. It's a redirect script for Squid; customers point their domain to our machine, and that box then forwards that request on to whatever one of our servers should handle it. So, the script looks something like:


while ( <> ) {
...
elsif ( s|http://www.theirsite.com/\W|http://our.server1/theirsite/\n|
+i ){ }
elsif ( s|http://www.theirsite.com/|http://our.server1/theirsite/|i ) 
+{ }
elsif ( s|http://www.dummy.com/\W|http://our.server2/dummy/\n|i ) { }

# .. ad nauseam
}
[download]

A couple of questions. I'd love some ideas for how to make this work more straightforwardly, especially defining the rules more clearly than a long list of regexes. Maybe it could use a text config file which expresses the simple, similar regexes that should be compiled at start up?

Secondly, since only one rule gets applied to each incoming URL, the most frequently used rules (which we can test against the logs) should go near the top, and the others near the bottom. However, it's a royal PITA to test and develop this. Any ideas on how to benchmark this painlessly, or a better algorithm - perhaps something B-Tree-ish - to order the rules?

thanks
ViceRaid

Update: Clarified as per diotalevi's nit

In reply to Organising lots of simple regexes by ViceRaid

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.