Re: Analysis of Regular Expressions

Replies are listed 'Best First'.
Re^2: Analysis of Regular Expressions by LanX (Saint) on Mar 17, 2010 at 11:43 UTC
> > Oh - yes and the computation of the "genericity/specifity" should be fast. > Hahahahaha. It's a little early for April 1. LOL, JavaFan++ =) Cheers Rolf	[reply]
Re^3: Analysis of Regular Expressions by PetaMem (Priest) on Mar 17, 2010 at 16:46 UTC
Uh.. boys... (JavaFan and LanX) gimme a break... I mean yes, I got my master in CS 13 years ago and there's no need to remind me of Gödel, the halting problem and similar. And YES, you are theoretically right. And NO, you are practically (especially JavaFan) way off. There are no countable infinite sets on finite computers, and if you have a FINITE SET of input strings, the cardinality of the sets that represent the results of the presented regexs may very well differ and thus be sortable. So your ivory tower arguments are mostly for the dustbin... I know, that for a Java Fan it might be very hard to adopt the Perl-inherent DWIM scheme when interpreting questions, but why LanX jumps on that bandwaggon is beyond me. Fotunately I got already some interesting hints in this thread and am working on a SOLUTION. Yes - will be presented for discussion. Bye PetaMem All Perl: MT, NLP, NLU	[reply]
Re^4: Analysis of Regular Expressions by LanX (Saint) on Mar 17, 2010 at 22:18 UTC
> I know, that for a Java Fan it might be very hard to adopt the Perl-inherent DWIM scheme when interpreting questions, but why LanX jumps on that bandwaggon is beyond me. Come on, keep cool! I just can't resist a good pun! =) Just think of me as too ignorant as to fully understand your question... ;-) > and if you have a FINITE SET of input strings, the cardinality of the sets that represent the results of the presented regexs may very well differ and thus be sortable. So your ivory tower arguments are mostly for the dustbin... true, but you only limited the sets of input strings and not the sets of regexes or allowed regex commands. IIRC Friedl (or perldocs?) describes in his book a regex with only two nested parens and quantifiers which would recursively backtrack for years! Without a detailed description we have to answer the question in generality, and fast solutions are not really likely! Cheers Rolf	[reply]
Re^4: Analysis of Regular Expressions by JavaFan (Canon) on Mar 18, 2010 at 00:27 UTC
There are no countable infinite sets on finite computers, and if you have a FINITE SET of input strings, the cardinality of the sets that represent the results of the presented regexs may very well differ and thus be sortable. True. Let's do some arithmetic. Let's take a computer with 1 kB of memory - not unreasonable in 2010. For argument sakes, ignore any other memory, including disk space. Limiting the problem space to all strings that can be stored on disk, you'd have 256¹⁰²⁴ finite strings to work with. 256¹⁰²⁴ is a finite number. Let's assume that each yoctosecond (10^-24s), you can analyse yotta-strings (10²⁴ strings). In other words, each second, you'd be able to determine of 10⁴⁸ ≅ 2¹⁶⁰ strings whether they match the given regexes. Then you still need about 10²⁴¹⁸ seconds to examine them all. The age of the universe is estimated to be about 13.75 billion years, which is about 4.4 * 10¹⁷ seconds. So, yeah, there are no countable infinite sets on computers. But I don't think it's really practical to do a brute-force calculation of all the possible FINITE strings a computer can hold.	[reply]
Re^4: Analysis of Regular Expressions (counterexample of slow regexes) by LanX (Saint) on Mar 19, 2010 at 13:14 UTC
OK... lets go explicit, the following (halting!) counterexamples use fairly short input strings and fairly simple regexes, nevertheless resulting in exploding run times... `perl -e ' for $n(10..14) { $str = "x"x($n2+1); $t=time; $str =~ /(x+x+){$n}y/; print "\$str:$str n:$n time:",time-$t,"sec\n" }' $str:xxxxxxxxxxxxxxxxxxxxx n:10 time:0sec $str:xxxxxxxxxxxxxxxxxxxxxxx n:11 time:3sec $str:xxxxxxxxxxxxxxxxxxxxxxxxx n:12 time:9sec $str:xxxxxxxxxxxxxxxxxxxxxxxxxxx n:13 time:39sec $str:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx n:14 time:156sec` [download] I hope it's now obvious that an empiric brute force approach can only (if ever) work with rigid restrictions! (credits go to ratazong for finding the base principle in this example after some `/msg`ing and credits to his unwillingness to post it afterwards ... ;) Cheers Rolf UPDATE: improved code... UPDATE just for fun :) perl -e ' for $n(1..14) { $str = "x"x($n3+1); $t=time; $str =~ /(x+x+x+){$n}y/; print "\$str:$str n:$n time:",time-$t,"sec\n" }' $str:xxxx n:1 time:0sec $str:xxxxxxx n:2 time:0sec $str:xxxxxxxxxx n:3 time:0sec $str:xxxxxxxxxxxxx n:4 time:0sec $str:xxxxxxxxxxxxxxxx n:5 time:0sec $str:xxxxxxxxxxxxxxxxxxx n:6 time:0sec $str:xxxxxxxxxxxxxxxxxxxxxx n:7 time:1sec $str:xxxxxxxxxxxxxxxxxxxxxxxxx n:8 time:9sec $str:xxxxxxxxxxxxxxxxxxxxxxxxxxxx n:9 time:73sec $str:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx n:10 time:555sec ^C [download]	[reply] [d/l] [select]