comment on

Assumptions:

You have access to a large subset of the files that these regular expressions will be used against.

If these regular expressions are simply being ranked in the abstract, you have some means of constantly accessing a variety of real-life sample files.

Write a daemon that continuously runs each regular expression against an ever growing list of files. The daemon updates a table, and each row contains 2 columns: the regular expression and the average line count.

Your output program simply sorts the table on the average line count column. So it is quick in that regard. However, as the daemon runs each regular expression against more and more files, the ranking may change.

Obviously, newly added regular expressions will have a more volatile rank compared to older ones that have been run against thousands of files. To combat this problem, you could determine a minimum file comparison quantity before the regular expression shows up in the table. For speed, you could have the daemon give priority to newly added regular expressions until their rank stabilizes. In fact, these two points should be configurable as tuning parameter of the daemon.

What I like about this approach is that it throws all the theoretical junk out the door. Brute force can be ugly, but then again, the map is not the territory, and brute force reveals the territory.

In reply to Re: Analysis of Regular Expressions by jffry
in thread Analysis of Regular Expressions by PetaMem

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.