in reply to Re: Is regex 1 covered by regex 2
in thread Is regex 1 covered by regex 2

Your observation that /\d/ and /[0-9]/ are (edit-insert: s/equivalent/loosely equivalent) loosely equivalent in regexen

(see Jenda's Re^3: Is regex 1 covered by regex 2;
note also that "equivalent" was my word choice; hdb's phrasing was clearly better)

is precisely why "similarity... as strings" (as judged by Text::Levenshtein or any package based on a minimally variant definition) is irrelevant in light of OP's specific objective of identifying equivalent regexen ("if a new regex given is already covered by another".

From perldoc Text::Levenshtein:

This module implements the Levenshtein edit distance, which measures the difference between two strings, in terms of the *edit distance*. This distance is the number of substitutions, deletions or insertions ("edits") needed to transform one string into the other one (and vice versa)....

I suspect (emphasis "suspect"; but lack the time just now to test the suspicion) that an approach with some chance of success in OP's terms would involve using the regex engine itself (but NOT by testing variants against identical data... where there would be far too many un-covered possibilities; edge cases and other instances where having two regexen match a particular text would lack rigor.

A study of the code used in various regex testers or tutors might be profitable.

Replies are listed 'Best First'.
Re^3: Is regex 1 covered by regex 2
by Jenda (Abbot) on Apr 29, 2015 at 14:56 UTC

    The two, \d and [0-9], are not equivalent unless you use the /a modifier. By someone's IMNSHO incorrect decision, \d was implemented to mean "anything that might be considered a digit in languages you will never hear of", not "number understood by Perl and usable in computation". While the likelihood that you end up with a string containing any such characters is pretty slim, you should play safe and either use [0-9] or the /a modifier!

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re^3: Is regex 1 covered by regex 2
by hdb (Monsignor) on Apr 29, 2015 at 12:17 UTC

    I fully support your comments. It is really up to Kafka to see whether such a simple approach would be sufficient. This probably also depends on the sophistication he can expect from his users (the more sophisticated they are the more likely this approach will not work). Whether or not testing against data is useful, would also depend on the context. If the data is from a limited domain, it could work. My suspicion though is, that he wants to do the checks to avoid unnecessary matching against data...