Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Most specific pattern

by tphyahoo (Vicar)
on Jul 01, 2005 at 16:44 UTC ( [id://471769]=note: print w/replies, xml ) Need Help??


in reply to Most specific pattern

I agree with a part of what jhourcle said above. The best fit would be the regex that matches, with the LEAST odds of matching.

IE, let's say you have regex "blah.*blah" and regex "blah".

And a string that matches both. I would say that the best match would be the first, because this has smaller odds of matching any given string of length n.

This seems to cover all cases to me. Where would this not be true?

Replies are listed 'Best First'.
Re^2: Most specific pattern
by thor (Priest) on Jul 01, 2005 at 17:02 UTC
    I agree with this...however, the devil's in the details. Given an arbitrary regex, how does one create a metric around how many non-wildcarded characters are in it? Is there a module that takes care of this? Or at least one that one could bend to make it fit?

    thor

    Feel the white light, the light within
    Be your own disciple, fan the sparks of will
    For all of us waiting, your kingdom will come

      That was just one of the metrics that I could think of... unless the number of regexes were so large that you couldn't rank them yourself (Going with the assumption that I know more about the process than a regex does)

      If I had to go completely on just odds of matching, I would think it'd be easiest to take a representative sample of inputs, and test them against each of the regexes, and build a table with the odds.

      If you don't have a log of those inputs for testing, then we'd have to get more creative ... I might use something like the following --

      • Any character or zero width assertion gets 1 point. (unless the assertion is pointless, like '\W\b\w'
      • Any character class of n characters gets f(n) points, where f(n) yields a number less than one, and decreases as n increases (maybe 1/n, or sqrt(1/n) )
      • Quantifiers reduce the value of the items they modify ... perhaps as multipliers... ( ? = 0.5; + = 0.6; * = 0.25; +? = 0.7; *? = 0.35 ) (I'm just pulling numbers out of the air...you'd want to tweek the numbers 'till you get good results for your situation).
      • Alterations provide something less than the points value of each of its possibilities. (I have no clue on a formula for this one...)

      I'm not aware of a module to do this sort of things, but that doesn't mean that there isn't one out there.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://471769]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-25 20:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found