Re: Re: Re: original definition vs final language

Replies are listed 'Best First'.
Re: Re: Re: Re: original definition vs final language by thraxil (Prior) on Oct 25, 2002 at 14:53 UTC
...ANY language that uses arbitrarily nested parenthesis (as in to any level of nesting)... the rest of your post is spot on (++) but this part i believe is wrong. arbitrarily nested parenthesis are a canonical example of a construct that can't be matched by a finite acceptor (DFA\|NFA) but can easily be matched by a pushdown automata (just push everytime you see an open paren, pop everytime you see a closed paren), and hence are valid in context-free languages. almost every textbook on automata and formal languages i've seen uses that as the transition between the Regular Languages chapter and the Context-Free Languages chapter. anders pearson	[reply]
Re: Re: Re: Re: Re: original definition vs final language by demerphq (Chancellor) on Oct 25, 2002 at 15:14 UTC
Hmm. Ok, its quite possible i got that wrong. Ill double check in the dragon when I get home (hopefuly soon.) But surely the stack itself is a form of context? I mean without _somehow_ counting how many parens you have gone through how do you determine if all the parens have been properly balanced? And counting would seem to me to be a form of context. Also how does such an approach determine that `([)]` is an error without context? (Hmm, is the last a good point, im not real sure.) BTW, Im really not offering the above as a definitive statement (just in case anybody missed the ?), just my gut reaction. Thanks for the reply, I had a feeling after I had posted it that someone would call me on this one, and that i wouldnt be able to properly respond until I get home. sigh --- demerphq my friends call me, usually because I'm late....	[reply] [d/l]
Re: Re: Re: Re: Re: Re: original definition vs final language by thraxil (Prior) on Oct 25, 2002 at 17:46 UTC
i think you're misunderstanding the meaning of the word 'context' in the term 'context-free grammar'. 'context' isn't the same as 'memory'. check out this wikipedia entry for a decent discussion of CFGs. i've had the idea of writing an article on formal language and automata theory aimed at perl programmers floating around in the back of my mind for a while. maybe i should get around to doing that soon. the short version as it relates to the topic at hand is this: a Finite Automata is a state machine with a finite number of states. it essentially has 'memory' equal to the number of states in its definition. a pushdown automata is a state machine that uses a stack to store its state so it can essentially have infinite 'memory'. however, because it can only push and pop the stack, it doesn't have a sense of 'context'; it can only see what's at the top of the stack. beyond pushdown automata there are turing machines which use a tape and can access it at any point, so they are able to have a sense of 'context'. languages are defined by the kind of automata that accept them: regular languages are accepted by finite automata, context-free languages are accepted by pushdown automata, context-sensitive languages are accepted by finite turing machines, and recursively enumerable languages are accepted by infinite turing machines. the mapping of languages to machines and how it all fits together is called the Chomsky Hierarchy. that was a gross simplification of the whole thing. hopefully i can make up for it with a full article in the near future. anders pearson	[reply]
Re: Re: Re: Re: Re: Re: original definition vs final language by thelenm (Vicar) on Oct 25, 2002 at 15:37 UTC
I'm no expert on formal languages, but one thing I do remember from college is that balancing parentheses is something that context-free languages are good at. For example, here's a context-free grammar that matches zero or more left parens followed by the same number of right parens: S -> ( S ) \| ε To be honest, I don't remember exactly what "context-free" means in a formal sense, but I don't think it's the same as "no context" in the way that you're thinking. -- Mike `-- just,my${.02}`	[reply]
Re: Re: Re: Re: original definition vs final language by RMGir (Prior) on Oct 25, 2002 at 13:56 UTC
Oooh, FLASHBACK. Dragon book, compiler course, writing basic interpreter, writing P compiler, much drinking, hangovers, expressions accidentally evaluating right-to-left, claiming to prof that this was an homage to APL, fun fun... :) -- Mike	[reply]
Re: Re: Re: Re: Re: original definition vs final language by demerphq (Chancellor) on Oct 25, 2002 at 15:02 UTC
lol claiming to prof that this was an homage to APL, Heh. Or Perl: `my @sorted=map { pop @$_} sort {$a->[0] <=> $b->[0]} map {[some_funky_key_calculator($_),$_]} @objects;` [download] Although people dont notice so much when its bottom to top for some reason. :-) BTW, tilly didnt like this at all. He wanted a special operator so it would read left to right. /Me I like it. But im a bit backwards sometimes... :-) --- demerphq my friends call me, usually because I'm late....	[reply] [d/l]
Re: Re: Re: Re: original definition vs final language by gjb (Vicar) on Oct 25, 2002 at 21:58 UTC
You are perfectly right in remarking that the implementation of a language has non-CFL aspects, but I never claimed otherwise. As you mention yourself, the grammar which is represented in the YACC/Bison file is CLF, and that is exactly what I claimed. Besides grammar there is indeed semantics (or implementation as I called it), and that is a different matter indeed. Only I was not talking about that, so please don't suggest I'm claiming things I'm not. Besides specifying the grammar one has to make it "do" do something using some programming languagge, and hence one is specifying the semantics in some RE (recursive enumerable) formalism. Incidently, it would be weird if one could get recursive enumerability with a pure CLF, wouldn't it? So no, I'm not claiming that. I'm familiar with the Dragon, having implemented a number of parsers for commercial applications. Regards, -gjb-	[reply]
Re(4): original definition vs final language by mstone (Deacon) on Oct 29, 2002 at 05:26 UTC
I'm sorry to say this, but you're at least partially mistaken. (and yes, I have been reading The Dragon Book recently.. quite a lot, in fact) Your mistake is based on your definition of the term context. Based on the way you've used the term in all the posts I've seen so far, you're using that word to mean what language developers call state, not in the Chomskian form, which differentiates: Type 3 Grammars aka: Regular Expressions Type 2 Grammars aka: Context-Free Grammars Type 1 Grammars aka: N-Character Lookahead Grammars and Type 0 Grammars aka: Recursively Enumerable Languages A Regular Expression has state in the sense that the lexer expects to see a given set of characters in a given order. The key fact about state is that the current state-value tells the lexer everything it needs to know about the sequence of input it's seen so far. For example, the regular expression a+b+ has three states: State 0: no input has been read at all. State 1: the input so far has been an unbroken sequence of 'a's (a+) State 2: the input started with a sequence of 'a's, but is now an unbroken sequence of 'b's (a+b+) From state 0, if the lexer sees an 'a', it moves to state 1. If the lexer sees another 'a' in state 1, it stays in state 1. If the lexer sees a 'b' in state 1, it advances to state 2. if the lexer sees another 'b' in state 2, it stays in state 2. Clearly you have to feed the lexer the right input in the right order, but this is stateful behavior, but it is not contextual behavior, in the Chomskian sense. A Context-Free Grammar is defined in terms of productions, each of which amounts to its own state machine. The trick here is that the productions can refer to each other recursively, which makes it impossible to identify a well-formed production strictly based on immediate state, like regular expressions can. For example, the Context-Free Grammar that represents nested parentheses looks like so: `group -> (group) \| e` [download] where the 'e' stands for 'epsilon', which equals a null string. We can't match that grammar with a regular expression, because just knowing that we've seen an unbroken sequence of left-parens doesn't tell us how many right-parens we need to see to balance the production. We can recognize this grammar by pushing state 1 (saw a left-paren) on a stack every time we come to the 'group' part, though. Then we start back at state 0 (no input so far), until we come to a null string between parens. Then we start popping our way out through the stack, advancing to state 2 every time we see a right paren. This is still stateful behavior, though. The parser still carries all the information it needs in the current state. We just has to add a few more rules to carry state information across the pushes and pops, i.e.: every push takes the system to state 0 of the upcoming production, and every pop takes the system to the next state in the outer production. In an N-Character Lookahead Grammar, the productions are ambiguous. You end up with some combination of symbols that can match more than one production. However, that ambiguity is always guaranteed to resolve itself within N characters of the ambiguous sequence. For example, the grammar: `foo -> A+ bar -> B+ baz -> foo bar quux -> foo B C` [download] contains three terminal symbols, A, B, and C, and two ambiguous productions: 'baz' and 'quux'. When the parser sees a series of 'A's followed by one B, it doesn't know whether it's seeing a complete 'baz', or a partial 'quux'. The parser can solve the ambiguity by looking at the next input symbol, though. If the next symbol is a B, the parser can be certain that it's looking at a 'baz'. If the next symbol is a C, the parser can be certain that it's looking at a 'quux'. therefore, the above grammar requires 1 character of lookahead for the parser to completely determine the validity (and identity) of an input string. For Chomsky's meaning of the word 'context', the above grammar is contextual within a one-character linear bound. In a Recursively Enumerable Grammar almost all bets are off. The most general description for an REG is "the set of all strings on which a given Turing machine will halt." Please note that this definition does not contain the 'well-formed algorithm' condition, 'and halts in a reasonable (finite) amount of time.' The halting problem is a Type-0 grammar, because you will get a list of all strings on which the machine halts eventually.. you just have to let some of them run not-quite-forever. ----- In other matters, the requirement that a symbol be defined before it can be used is a semantic rule, not a syntactic rule. The parser won't catch that kind of error, the compiler will catch it during sytax-directed analysis of the parse tree. To flip that idea on its head, the fact that you failed to define a symbol before using it doesn't make the statement which uses that symbol syntactically incorrect. It's semantically wrong, yes, but the statement itself can still be well-formed. Grammar and semantics have only a nodding acquaintance in most programming languages. ----- To get back to the original question, no, Perl is not Context-Free. It has a set of operator-precedence rules that amount to N-character lookahead, thus making it a Type-1 grammar. One example of those rules would be the expression: $x + $y 2. The parser has to look ahead to see the exponentiation operator before it can decide whether to interpret the expression as ($x + $y)2 or as $x + ($y 2).** That makes it context-sensitive, and thus not a CFG.	[reply] [d/l] [select]
Re: Re(4): original definition vs final language by demerphq (Chancellor) on Oct 31, 2002 at 17:56 UTC
Thank you very much for the detailed correction and explanation. You are very correct that i was mixing up state and context. After this node happend I had the opportunity to review the red dragon in detail and realized much of what you posted here, and what i had gotten wrong in my earlier posts. (Apologies to all involved). Unfortunately I havent had the time to be online so I couldnt redress the situation. :( Incidentally, if being wrong elicits corrections this good then I don't mind being wrong occasionally. ++ Excellent reply. --- demerphq my friends call me, usually because I'm late....	[reply]
Re: Re(4): original definition vs final language by dystrophy (Monk) on Oct 30, 2002 at 06:16 UTC
Thanks for all the replies. I really do appreciate it. Theory is wonderful, but I'm discovering that not having any application hurts both my motivation and understanding. ;-) Unfortunately, I've discovered that most of my fellow students have not figured out even what the course is about! - dystrophy -	[reply]