A little warning. The yacc parser config in perl (perly.y) is 763 lines long, while the lexer (toke.c) is 10998 lines long. That should be a clue as to how much special-case code perl needs that the parser can't handle itself.
I was in a conversation the other day where Larry said that the reason he did that was to be able to say that Perl 5 had a smaller parser than did Perl 4. Now that he's refactoring the tokenizer, I think he regrets that choice somewhat.