PetaMem has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
this one is certainly a little difficult/hairy/tricky and I'm aware, that there probably can't be a definitive answer. In fact, I had a hard time of even deciding whether to put this into Seekers or Meditations.
I'll need a function that will return either a "genericity" - or - "specifity" of a regular expression. The idea is, that according to a measure of "specifity", the following regular expressions are sorted from the most specific to the most generic:
1) \AHi\z 2) \AHi 3) Hi 4) \b(Hi(ya)?|Hello|Greetings)\b 5) (Hi(ya)?|Hello|Greetings) 6) .*
So in other words, a regular expression is more specific than another, if it "matches less" than that other rx. (well - and there's the problem - it is questionable if 4) isn't more specific than 3) )
Any creative ideas how to achieve this? Oh - yes and the computation of the "genericity/specifity" should be fast. The best I could come up so far was to dissect each regular expression with other regular expressions and adding/subtracting "weight" at the occurence of different metacharacters with very limited success so far.
So if anyone has a good idea or maybe a pointer to something similar that has been done before, I'd love to see that.
Bye
PetaMem All Perl: MT, NLP, NLU
|
|---|