Re: Is regex 1 covered by regex 2
by choroba (Cardinal) on Apr 29, 2015 at 10:14 UTC
|
You might benefit from reading chapter 6.5 Regex String Generation from the "Higher Order Perl" by Mark Jason Dominus (available online at http://hop.perl.plover.com/book/).
Update:Fixed the author.
| [reply] |
Re: Is regex 1 covered by regex 2
by hdb (Monsignor) on Apr 29, 2015 at 10:58 UTC
|
There is probably no simple solution as the regexes can look very different but mean (nearly the same), like /\d/ and /[0-9]/. You could check for the similarity of the regexes when seeing them as strings:
use strict;
use warnings;
use Text::Levenshtein 'distance';
my @old = (
'abc.*f',
'hello world\d',
'1234' );
my $new = 'abcd.*f';
for (@old) {
print "Warning: $new looks like $_\n" if distance( $new, $_ ) < 6;
}
There are a lot of possible measures of distance around. | [reply] [d/l] [select] |
|
|
Your observation that /\d/ and /[0-9]/ are (edit-insert: s/equivalent/loosely equivalent) loosely equivalent in regexen
(see Jenda's Re^3: Is regex 1 covered by regex 2; note also that "equivalent" was my word choice; hdb's phrasing was clearly better)
is precisely why "similarity... as strings" (as judged by Text::Levenshtein or any package based on a minimally variant definition) is irrelevant in light of OP's specific objective of identifying equivalent regexen ("if a new regex given is already covered by another".
From perldoc Text::Levenshtein:
This module implements the Levenshtein edit distance, which measures the
difference between two strings, in terms of the *edit distance*. This distance is the number of substitutions, deletions or insertions ("edits") needed to transform one string into the other one (and vice versa)....
I suspect (emphasis "suspect"; but lack the time just now to test the suspicion) that an approach with some chance of success in OP's terms would involve using the regex engine itself (but NOT by testing variants against identical data... where there would be far too many un-covered possibilities; edge cases and other instances where having two regexen match a particular text would lack rigor.
A study of the code used in various regex testers or tutors might be profitable.
| [reply] [d/l] [select] |
|
|
The two, \d and [0-9], are not equivalent unless you use the /a modifier. By someone's IMNSHO incorrect decision, \d was implemented to mean "anything that might be considered a digit in languages you will never hear of", not "number understood by Perl and usable in computation". While the likelihood that you end up with a string containing any such characters is pretty slim, you should play safe and either use [0-9] or the /a modifier!
Jenda
Enoch was right!
Enjoy the last years of Rome.
| [reply] [d/l] [select] |
|
|
I fully support your comments. It is really up to Kafka to see whether such a simple approach would be sufficient. This probably also depends on the sophistication he can expect from his users (the more sophisticated they are the more likely this approach will not work). Whether or not testing against data is useful, would also depend on the context. If the data is from a limited domain, it could work. My suspicion though is, that he wants to do the checks to avoid unnecessary matching against data...
| [reply] |
Re: Is regex 1 covered by regex 2
by Anonymous Monk on Apr 29, 2015 at 10:16 UTC
|
| [reply] |
|
|
| [reply] [d/l] |
|
|
| [reply] |
Re: Is regex 1 covered by regex 2
by jeffa (Bishop) on Apr 29, 2015 at 16:10 UTC
|
"... if a new regex given is already covered by another ..."
You left this requirement open to ambiguous interpretation. Do you want an exact match or do you want fuzzy matching? For example: abc.*f will match all the things that abc\wf will match, but not vice versa. So if abc.*f is already issued and the user tries to issue abc\wf, should your program issue a warning?
In the off-change that it should not ... then all you need to is store each submitted value from user input into a hash as a key and increment the value for that key each time the key is seen:
my %seen;
while (chomp(my $input = <STDIN>)) {
exit unless $input;
$seen{$input}++;
warn "you already issued $input\n" if $seen{$input} > 1;
print "issued: ", join(', ', sort keys %seen ) ,$/;
}
If this is correct, then the fact that you are storing regular expressions is misleading -- you really just want to store strings that will be used as regular expressions later. Please clarify. :)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] [select] |
|
|
Hi In the case you gave abc.*f covers all the strings that abc\wf does, so a warning should be issued.
| [reply] |
Re: Is regex 1 covered by regex 2
by Laurent_R (Canon) on Apr 29, 2015 at 18:06 UTC
|
$ perl -dE 42
Loading DB routines from perl5db.pl version 1.33
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(-e:1): 42
DB<1> $regex = qr/abc.*f/;
DB<2> $new = "abcd.*f";
DB<3> $new =~ s/.\*//g;
DB<4> $new =~s/(.)\+/$1/g;
DB<5> p $new;
abcdf
DB<6> print "true" if $new =~ /$regex/;
true
DB<7>
I do not know whether this approach can lead to something interesting, but it would be very hard to make it error proof.
Update 18:13 UTC: fixed a typo in line DB<4>.
| [reply] [d/l] [select] |
Re: Is regex 1 covered by regex 2
by MidLifeXis (Monsignor) on Apr 29, 2015 at 12:29 UTC
|
++ and Front-paged not so much for this particular question, but for the discussion on the general concept: identifying sub-grammars within a larger grammar. A topic like this tickles my CS fancy, and I am looking forward to the discussion it could generate.
| [reply] |