Other possibilities for scoring that we've throught about are: the length of the match - regexes that match more of an example are scored higher, and specificity - regexes that are more specific are scored higher (qr/^[A-Z]{2}$/ is more specific than qr/^\w+$/, qr/^.+$/ is so non-specific, that we don't even consider it valid).

Of course, this points out another weakness in the approach the example code uses - it only considers left-anchored regexes, so it tends not to notice commonalities on the right hand side (or anywhere else in the data for that matter).

I'm not saying we've got the problem solved, or that it's even tractable in the general case. We just have an approach that works for some cases.


In reply to Re: Re: Re: (FoxUni) Re: generating regexes? by mortis
in thread generating regexes? by mortis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.