Is regex 1 covered by regex 2

Kafka has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Is regex 1 covered by regex 2 by choroba (Cardinal) on Apr 29, 2015 at 10:14 UTC
You might benefit from reading chapter 6.5 Regex String Generation from the "Higher Order Perl" by Mark Jason Dominus (available online at http://hop.perl.plover.com/book/). Update:Fixed the author. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re: Is regex 1 covered by regex 2 by hdb (Monsignor) on Apr 29, 2015 at 10:58 UTC
There is probably no simple solution as the regexes can look very different but mean (nearly the same), like `/\d/` and `/[0-9]/`. You could check for the similarity of the regexes when seeing them as strings: `use strict; use warnings; use Text::Levenshtein 'distance'; my @old = ( 'abc.f', 'hello world\d', '1234' ); my $new = 'abcd.f'; for (@old) { print "Warning: $new looks like $_\n" if distance( $new, $_ ) < 6; }` [download] There are a lot of possible measures of distance around.	[reply] [d/l] [select]
Re^2: Is regex 1 covered by regex 2 by ww (Archbishop) on Apr 29, 2015 at 11:48 UTC
Your observation that `/\d/` and `/[0-9]/` are (edit-insert: `s/equivalent/loosely equivalent`) loosely equivalent in regexen (see Jenda's Re^3: Is regex 1 covered by regex 2; note also that "equivalent" was my word choice; hdb's phrasing was clearly better) is precisely why "similarity... as strings" (as judged by Text::Levenshtein or any package based on a minimally variant definition) is irrelevant in light of OP's specific objective of identifying equivalent regexen ("if a new regex given is already covered by another". From `perldoc Text::Levenshtein`: `This module implements the Levenshtein edit distance, which measures the difference between two strings, in terms of the edit distance. This distance is the number of substitutions, deletions or insertions ("edits") needed to transform one string into the other one (and vice versa)....` I suspect (emphasis "suspect"; but lack the time just now to test the suspicion) that an approach with some chance of success in OP's terms would involve using the regex engine itself (but NOT by testing variants against identical data... where there would be far too many un-covered possibilities; edge cases and other instances where having two regexen match a particular text would lack rigor. A study of the code used in various regex testers or tutors might be profitable. Spirit of the Monastery	[reply] [d/l] [select]
Re^3: Is regex 1 covered by regex 2 by Jenda (Abbot) on Apr 29, 2015 at 14:56 UTC
The two, \d and `[0-9]`, are not equivalent unless you use the /a modifier. By someone's IMNSHO incorrect decision, \d was implemented to mean "anything that might be considered a digit in languages you will never hear of", not "number understood by Perl and usable in computation". While the likelihood that you end up with a string containing any such characters is pretty slim, you should play safe and either use `[0-9]` or the /a modifier! Jenda Enoch was right! Enjoy the last years of Rome.	[reply] [d/l] [select]
Re^3: Is regex 1 covered by regex 2 by hdb (Monsignor) on Apr 29, 2015 at 12:17 UTC
I fully support your comments. It is really up to Kafka to see whether such a simple approach would be sufficient. This probably also depends on the sophistication he can expect from his users (the more sophisticated they are the more likely this approach will not work). Whether or not testing against data is useful, would also depend on the context. If the data is from a limited domain, it could work. My suspicion though is, that he wants to do the checks to avoid unnecessary matching against data...	[reply]
Re: Is regex 1 covered by regex 2 by Anonymous Monk on Apr 29, 2015 at 10:16 UTC
Its kinda hard, but see regex tester (link to search) by davido?, PPIx::Regexp...	[reply]
Re^2: Is regex 1 covered by regex 2 by Discipulus (Canon) on Apr 29, 2015 at 14:20 UTC
davido's precious site to test regular expressions is now here There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re^2: Is regex 1 covered by regex 2 by Anonymous Monk on Apr 29, 2015 at 22:16 UTC
more links to regex parser tools, The Perl Regex Tester, Re^3: My Favourite Regex Tools (Was: Parsing a Variable Format String)	[reply]
Re: Is regex 1 covered by regex 2 by jeffa (Bishop) on Apr 29, 2015 at 16:10 UTC
"... if a new regex given is already covered by another ..." You left this requirement open to ambiguous interpretation. Do you want an exact match or do you want fuzzy matching? For example: `abc.f` will match all the things that `abc\wf` will match, but not vice versa. So if `abc.f` is already issued and the user tries to issue `abc\wf`, should your program issue a warning? In the off-change that it should not ... then all you need to is store each submitted value from user input into a hash as a key and increment the value for that key each time the key is seen: `my %seen; while (chomp(my $input = <STDIN>)) { exit unless $input; $seen{$input}++; warn "you already issued $input\n" if $seen{$input} > 1; print "issued: ", join(', ', sort keys %seen ) ,$/; }` [download] If this is correct, then the fact that you are storing regular expressions is misleading -- you really just want to store strings that will be used as regular expressions later. Please clarify. :) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re^2: Is regex 1 covered by regex 2 by Kafka (Acolyte) on Apr 30, 2015 at 08:23 UTC
Hi In the case you gave abc.*f covers all the strings that abc\wf does, so a warning should be issued.	[reply]
Re: Is regex 1 covered by regex 2 by Laurent_R (Canon) on Apr 29, 2015 at 18:06 UTC
Hum, a very limited try at using the regex engine to simplify the string entered by the user and see if it can be matched by an existing regex. The idea is to remove any occurrence of `.`, to replace `(.)+` by a single occurrence of the letter matched by `(.)`, etc. This is very brief attempt under the debugger: $ perl -dE 42 Loading DB routines from perl5db.pl version 1.33 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 42 DB<1> $regex = qr/abc.f/; DB<2> $new = "abcd.f"; DB<3> $new =~ s/.\//g; DB<4> $new =~s/(.)\+/$1/g; DB<5> p $new; abcdf DB<6> print "true" if $new =~ /$regex/; true DB<7> [download] I do not know whether this approach can lead to something interesting, but it would be very hard to make it error proof. Update 18:13 UTC: fixed a typo in line `DB<4>`. Je suis Charlie.	[reply] [d/l] [select]
Re: Is regex 1 covered by regex 2 by MidLifeXis (Monsignor) on Apr 29, 2015 at 12:29 UTC
++ and Front-paged not so much for this particular question, but for the discussion on the general concept: identifying sub-grammars within a larger grammar. A topic like this tickles my CS fancy, and I am looking forward to the discussion it could generate. --MidLifeXis	[reply]