Searching for regular expression overlaps

Giardap has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Searching for regular expression overlaps by LanX (Saint) on Aug 12, 2016 at 16:27 UTC
Perl's regexp are a language on their own, if you REALLY² allow the full spectrum of features this is not imaginable for me.° I'm sure there are some computer science theorems proving this. (probably Halting problem?) The only way I could see is with a restricted rules language, which is translated to regexp later. Most likely a XY Problem ... ... monitoring if newly added rules also match already matched filenames and raising an alarm should be good enough. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!} °) as a side note, you can also embed Perl-code in regexes ²) from arbitrary sources ... that's crazy	[reply]
Re: Searching for regular expression overlaps by BrowserUk (Patriarch) on Aug 12, 2016 at 19:13 UTC
I agree with LanX. Even if this is possible, it is the wrong solution, whatever the problem. Using regex, rather than wildcards for matching filenames is like using a bazooka to swat flies. See glob, File::Glob, File::BSDGlob etc. In general, rule-based processing is hierarchical. That is, earlier rules 'override' later ones. Ie. the last rule is a catch-all: .; the preceding level is: .txt, .exe, .png etc.; the level before that is: th_.png etc.; and the top level are individual files: xyz.exe, th_qpr.png. If you describe the problem you need to solve -- rather than the problem with your failed solution -- you'll likely get better answers. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Searching for regular expression overlaps by AnomalousMonk (Archbishop) on Aug 12, 2016 at 17:35 UTC
Maybe take a look at chapter 6.5 "Regex String Generation" of Dominus's excellent "Higher-Order Perl" book (free download here). Sez MJD: Given a regex, how can one generate a list of all the strings matched by the regex? The problem can be rather difficult to solve. But the solution with streams is straightforward and compact. So there you have it: "rather difficult to solve," "straightforward and compact," what are we waiting for? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^2: Searching for regular expression overlaps by LanX (Saint) on Aug 12, 2016 at 20:27 UTC
Thanks for the link, it's a fascinating read (again). some things most be noted MJD is restricting himself to a (very powerful) subset of possible regexes. E.g. he's neither covering look ahead, look behind nor embedded Perl. He doesn't cover the question of how to translate a regex to his stream representation. BUT fascinating enough he shows later that he can produce a sorted output: first by length and second lexicographic. That means one could probably create a "pragmatic" comparison of two streams, i.e. comparing the leading substrings of all solutions up to a certain length and quickly identifying "incompatible" regexes... If there is still an overlap possible after n characters one could reject a regex for being too "similar" (potentially compatible). Well theoretically, I won't try to implement this. .. :) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re^3: Searching for regular expression overlaps by AnomalousMonk (Archbishop) on Aug 12, 2016 at 23:23 UTC
I tend to agree with you and BrowserUk that what Giardap wants is probably the wrong solution to the wrong problem. And I won't, thank goodness, be implementing it either :) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^4: Searching for regular expression overlaps by LanX (Saint) on Aug 12, 2016 at 23:56 UTC
Re^2: Searching for regular expression overlaps by LanX (Saint) on Aug 12, 2016 at 18:36 UTC
Does calculating overlaps of two regexes really belong to the same problem class like calculating all matches per regex and checking for overlap? I doubt this. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re: Searching for regular expression overlaps by flowdy (Scribe) on Aug 15, 2016 at 13:05 UTC
The simplest way to achieve that (roughly) is the ad-hoc method in connection with a test suite expanded with every string that has been tested, or at least a set of them cherrypicked by a human. Just do not stop at the first regex that matches an input, but always try to match against all regexes. If it happens that $matching_regexen > 1, you will have to roll up your sleeves in oder to apply optimisations once again, and to re-run your test suite afterwards so you make sure there are no regressions. Guess how many strings would match against m{.}. A hypothetical program as you imagine would never finish.	[reply]
Re: Searching for regular expression overlaps by Laurent_R (Canon) on Aug 13, 2016 at 09:12 UTC
Update: forget the post below, I was probably not awake when I wrote it. How can you be sure that your current 200 regexes don't overlap? With 200 regexes, checking each regex against each other basically means visiting 2 ** 199 combinations, i.e. 8e60 possibilities (more precisely: 803469022129495137770981046170581301261101496891396417650688 possibilities). ~~And with just 275 regexes, the number of combinations is larger than the estimated number of atoms in the universe.~~ Update (@ 09.26 UTC): Oops, wrong calculation. The number of combinations is actually `fact 200 / fact 2`, i.e. a number much larger than what I wrote above (a number with more than 370 digits). Actually, with just 60 regexes, the number of combinations exceeds the estimated number of atoms in the universe.	[reply] [d/l]
Re^2: Searching for regular expression overlaps by LanX (Saint) on Aug 13, 2016 at 10:34 UTC
I understood only pairwise comparisons are necessary So it's only quadratic: n(n-1)/2 = 19900 Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re^3: Searching for regular expression overlaps by Laurent_R (Canon) on Aug 13, 2016 at 17:45 UTC
Yes, Rolf, it seems that you're right. But 19900 is still far too many combinations for a manual check.	[reply]
Re^4: Searching for regular expression overlaps by LanX (Saint) on Aug 13, 2016 at 17:57 UTC