Re: Searching for regular expression overlaps
by LanX (Saint) on Aug 12, 2016 at 16:27 UTC
|
Perl's regexp are a language on their own, if you REALLY˛ allow the full spectrum of features this is not imaginable for me.°
I'm sure there are some computer science theorems proving this. (probably Halting problem?)
The only way I could see is with a restricted rules language, which is translated to regexp later.
Most likely a XY Problem ...
... monitoring if newly added rules also match already matched filenames and raising an alarm should be good enough.
°) as a side note, you can also embed Perl-code in regexes
˛) from arbitrary sources ... that's crazy | [reply] |
Re: Searching for regular expression overlaps
by BrowserUk (Patriarch) on Aug 12, 2016 at 19:13 UTC
|
I agree with LanX. Even if this is possible, it is the wrong solution, whatever the problem.
- Using regex, rather than wildcards for matching filenames is like using a bazooka to swat flies.
See glob, File::Glob, File::BSDGlob etc.
- In general, rule-based processing is hierarchical.
That is, earlier rules 'override' later ones.
Ie. the last rule is a catch-all: *.*; the preceding level is: *.txt, *.exe, *.png etc.; the level before that is: th_*.png etc.; and the top level are individual files: xyz.exe, th_qpr.png.
If you describe the problem you need to solve -- rather than the problem with your failed solution -- you'll likely get better answers.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re: Searching for regular expression overlaps
by AnomalousMonk (Archbishop) on Aug 12, 2016 at 17:35 UTC
|
Maybe take a look at chapter 6.5 "Regex String Generation" of Dominus's excellent "Higher-Order Perl" book (free download here). Sez MJD:
Given a regex, how can one generate a list of all the strings matched by the regex? The problem can be rather difficult to solve. But the solution with streams is straightforward and compact.
So there you have it: "rather difficult to solve," "straightforward and compact," what are we waiting for?
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] |
|
|
Thanks for the link, it's a fascinating read (again).
some things most be noted
- MJD is restricting himself to a (very powerful) subset of possible regexes.
E.g. he's neither covering look ahead, look behind nor embedded Perl.
- He doesn't cover the question of how to translate a regex to his stream representation.
BUT fascinating enough he shows later that he can produce a sorted output: first by length and second lexicographic.
That means one could probably create a "pragmatic" comparison of two streams, i.e. comparing the leading substrings of all solutions up to a certain length and quickly identifying "incompatible" regexes...
If there is still an overlap possible after n characters one could reject a regex for being too "similar" (potentially compatible).
Well theoretically, I won't try to implement this. .. :)
| [reply] |
|
|
| [reply] [d/l] |
|
|
|
|
Does calculating overlaps of two regexes really belong to the same problem class like calculating all matches per regex and checking for overlap?
I doubt this.
| [reply] |
Re: Searching for regular expression overlaps
by flowdy (Scribe) on Aug 15, 2016 at 13:05 UTC
|
The simplest way to achieve that (roughly) is the ad-hoc method in connection with a test suite expanded with every string that has been tested, or at least a set of them cherrypicked by a human.
Just do not stop at the first regex that matches an input, but always try to match against all regexes. If it happens that $matching_regexen > 1, you will have to roll up your sleeves in oder to apply optimisations once again, and to re-run your test suite afterwards so you make sure there are no regressions.
Guess how many strings would match against m{.}. A hypothetical program as you imagine would never finish.
| [reply] |
Re: Searching for regular expression overlaps
by Laurent_R (Canon) on Aug 13, 2016 at 09:12 UTC
|
Update: forget the post below, I was probably not awake when I wrote it.
How can you be sure that your current 200 regexes don't overlap? With 200 regexes, checking each regex against each other basically means visiting 2 ** 199 combinations, i.e.
8e60 possibilities (more precisely: 803469022129495137770981046170581301261101496891396417650688 possibilities).
And with just 275 regexes, the number of combinations is larger than the estimated number of atoms in the universe.
Update (@ 09.26 UTC): Oops, wrong calculation. The number of combinations is actually fact 200 / fact 2, i.e. a number much larger than what I wrote above (a number with more than 370 digits). Actually, with just 60 regexes, the number of combinations exceeds the estimated number of atoms in the universe.
| [reply] [d/l] |
|
|
| [reply] |
|
|
| [reply] |
|
|