Re^2: check for square-number with a regex
by moritz (Cardinal) on Oct 23, 2009 at 15:03 UTC
|
When we talk about a regex in the context of Perl, we don't mean Regular Expression in the sense that a computer scientist would use (ie Typ-3 language of the Chomsky hierarchy), but rather "the thing that you but in m/... in Perl".
These regexes are much more powerful than regular expressions, in fact they can do things that even Typ-2 langauges (context-free languages) can't. I've not yet seen a proof that they are Turing complete, so I don't know if they are.
Update: fixed Chomsky classification, thanks jakobi
Perl 6 - links to (nearly) everything that is Perl 6.
| [reply] [d/l] |
|
|
These regexes are much more powerful than regular expressions, in fact they can do things that even Typ-1 langauges (context-free languages) can't. I've not yet seen a proof that they are Turing complete, so I don't know if they are.
In a sense they're ‘obviously’ Turing-complete, since they can contain embedded Perl code; but I assume that's not what you meant. What's another example of Perl regexes recognising a language that a CFG can't? (It sounds vaguely like a challenge, but I'm just politely curious.)
| [reply] |
|
|
First of all sorry for getting the numbering wrong, I misremembered REG as being Typ-0 and Turing as being Typ-3, it's the other way round.
What's another example of Perl regexes recognising a language that a CFG can't?
m/^(\w+)\1$/
Is already outside CFG.
CFG is parsed by a non-deterministic finite automaton which has access to a stack of infinite size.
Since it can only access the top element of the stack, it has only access to the capture in reverse order, or with a finite look-behind on the stack (implemented by storing some values in the state of the NFA). A more formal proof can be made with the Pumping lemma for context-free languages.
I also think that the regex primality test is outside of CFG, but I've never proven that, so don't rely on it ;-)
| [reply] [d/l] |
|
|
Re^2: check for square-number with a regex
by jakobi (Pilgrim) on Oct 23, 2009 at 15:07 UTC
|
Albeit that argument assumes "classical" regexes. No backrefs like \1 nor any of the 'comfort' in egrep, PCRE, Perl or - even worse - Perl's (?{}), all of which violate Chomsky-3 AFAIR. Then again even Chomsky-3 can be surprising powerful compared to other 'machines' - just limit the others' allowed tape and/or word size... .
(Moritz: small typo: DFA's ch-3, with ch-0 being TM)
Wasn't there some recent discussion or article on Perl6 extensions to regexes (or just on Parsing Expression Grammars/PEG; LUA version maybe?) that contained some comparison to the Chomsky Hierarchy?
| [reply] |
Re^2: check for square-number with a regex
by JadeNB (Chaplain) on Oct 23, 2009 at 15:08 UTC
|
Note that it's only ‘pure’ regexes (for which Ratazong did indicate a preference) that are equivalent to finite automata. I know that back-references make regexes essentially push-down automata. Are Perl's regexes still ‘only’ that powerful? I guess that they're not, really, because they can execute embedded code; but that's not playing fair. | [reply] |
|
|
Backreferences are neither enough, nor necessary to make up the difference between languages that are recognized by FAs ("regular expressions") and those that are recognized by PDAs ("context-free grammars").
As I pointed out, a classical context-free language that isn't a regular expression is anbn. You cannot match that with 5.8 regular expressions (assuming no jumping back to the interpreter using (?{}) or (??{})), proving that backreferences aren't enough. But in 5.10, you can write /^(a(?1)b)$/, which doesn't use backreferences. It does use recursion, but recursion is the difference between an FA and a PDA.
Now, I think that due to backreferences, Perl regular expressions are more powerful than context-free grammars, but I cannot think of a language that is matched by a Perl regular expression, and not by a context-free grammar. And I don't think Perl regular expressions are as powerful as context sensitive grammars (which requires a TM). For instance, I don't think there's a Perl regular expression to match anbncn.
| [reply] [d/l] |
|
|
I think you're aware of this solution for anbncn but I'll post it anyway(it uses ??{} blocks and ?{} and is a Perl regex).
$r = qr/^(a+b)(?{ $a = -1+length $^N; })(??{ $b=$a-1;"(b{$b})" })(??{
+"(c{$a})" })/x;
sub test {
print
$_[0] =~ $r
? "matched\n"
: "not matched\n";
}
test "aaabbbccc";
test "aaaabbbccc";
test "aaaabbbbccc";
test "aaaabbbbcccc";
test "aabbbbcccc";
test "abc";
test "aabbcc";
Result on Perl v5.8.8:
matched
not matched
not matched
matched
not matched
matched
matched
| [reply] [d/l] [select] |
|
|
Re^2: check for square-number with a regex
by JavaFan (Canon) on Oct 23, 2009 at 15:10 UTC
|
But Perl regexes have backreferences. Which makes them more powerful than finite automata. Note that your first quote could also be used to prove you cannot accept strings that are a composite number. But /^(..+)\1+$/ proves that you can.
Note also that Perl 5.10 regexes have named rules, allowing recursion. Which means, IMO, that Perl regexes are at least as powerful as PDA (push down automata). Which means they should be able to accept any context insensitive language.
The classical example is that with a "regular expression" you cannot match strings for the form anbn, but that you need a PDA. /^(a(?1)?b)$/ matches such strings. | [reply] [d/l] [select] |
Re^2: check for square-number with a regex
by spx2 (Deacon) on Oct 23, 2009 at 15:14 UTC
|
I forgot to mention another source that gives as exercise to prove that {a^n| with n a square} is not recognizable as a language by a pushdown automaton.
You might be right about Perl regexes being more powerful. I honestly don't know so I'll call it a day. | [reply] |