in reply to Natural Language Sentence Production

Just in case it can help to people trying to solve a similar problem:

Probably yagg is the righ tool for that.

Though Parse::Eyapp was conceived for parsing, versions 1.137 and later provide support to build a phrase generator from a grammar specification. If you want to know more, read the tutorial Parse::Eyapp:::datagenerationtut. The example used produces sequences of assignment statements:

Parse-Eyapp/examples/generator$ ./Generator.pm # result: -710.2 I=(3*-8+7/5); R=2+8*I*4+5*2+I/I
To specify the language we write a yacc-like grammar, but instead of writing the classical lexer, i. e. scanning the input to produce the next token, we write a token generator: Each time our lexical analyzer is called, it checks the list of expected tokens (available via the method YYExpect) and produces - following some probability distribution - one of them. This is the grammar for the calculator:
Parse-Eyapp/examples/generator$ cat -n Generator.eyp 1 # file: Generator.eyp 2 # compile with: eyapp -b '' Generator.eyp 3 # then run: ./Generator.pm 4 %strict 5 %token NUM VARDEF VAR 6 7 %right '=' 8 %left '-' '+' 9 %left '*' '/' 10 %left NEG 11 %right '^' 12 13 %defaultaction { 14 my $parser = shift; 15 16 return join '', @_; 17 } 18 19 %{ 20 use base q{Parse::Eyapp::TokenGen}; 21 use base q{GenSupport}; 22 %} 23 24 %% 25 26 stmts: 27 stmt 28 { # At least one variable is defined now 29 $_[0]->deltaweight(VAR => +1); 30 $_[1]; 31 } 32 | stmts ';' { "\n" } stmt 33 ; 34 35 stmt: 36 VARDEF '=' exp 37 { 38 my $parser = shift; 39 $parser->defined_variable($_[0]); 40 "$_[0]=$_[2]"; 41 } 42 ; 43 exp: 44 NUM 45 | VAR 46 | exp '+' exp 47 | exp '-' exp 48 | exp '*' exp 49 | exp '/' exp 50 | '-' { $_[0]->pushdeltaweight('-' => -1) } 51 exp %prec NEG { 52 $_[0]->popweight(); 53 "-$_[3]" 54 } 55 | exp '^' exp 56 | '(' { $_[0]->pushdeltaweight( '(' => -1, ')' => +1, '+' => +1, ); } 57 exp 58 ')' 59 { 60 $_[0]->popweight; 61 "($_[3])" 62 } 63 ; 64 65 %% 66 67 unless (caller) { 68 __PACKAGE__->main(@ARGV); 69 }
The difficult part is the management of the probability distribution to produce reasonable phrases and to avoid very long statements. The generation of tokens and its attributes uses Test::LectroTest::Generator. The support subroutines have been isolated in the module GenSupport.pm (see http://cpansearch.perl.org/src/CASIANO/Parse-Eyapp-1.137/examples/generator/GenSupport.pm ).