Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
This process basically has a config file that has a bunch of regexes, and for each regex, an action. Items are passed to the process, and it determines the "best fit" and executes that action associated with it. However, in reading the documentation, it seems that the metric that they use for "best fit" is the pattern with the fewest wildcard characters.

Are you sure the documentation is a correct description of what the code does? (Are you sure you are interpreting it correctly?)

When a given input is able to match two or more distinct regexes in a set (and the chosen action differs for each regex), then there are two ways to choose among the possible paths:

  • If the regexes are applied in a specific order and the first one to match determines the action, then there's a simple and unavoidable rule, similiar to the idea in tphyahoo's reply: the regexes need to be ordered from longest specific match to shortest. Otherwise, the longer, more specific regexes would never get a chance to apply.

  • If all regexes are tested before any action is taken, and their successes are scored in some way before deciding which action to take, this would provide some flexibility in defining how the scoring works (you could even introduce probabilities, if you wanted results that were non-deterministic but could still correlate in some way with the inputs).

    But for most systems (where people want the results to be deterministic), it's more likely that the best way to score would be based on the length of the match, or rather, the difference between the length of the input and the length of the match. If two regexes yield the same minimum length difference between input and match, then the next criterion might be some sort of ranking among the regexes in terms of their relative literalness or specificity -- e.g. the one using more literals, smaller character classes and/or smaller quantity qualifiers would rank higher. But it's hard to conceive of a programmatic approach to ranking regexes this way.


In reply to Re: Most specific pattern by graff
in thread Most specific pattern by thor

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-18 17:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found