Just in case it can help to people trying to solve a similar problem:

Probably yagg is the righ tool for that.

Though Parse::Eyapp was conceived for parsing, versions 1.137 and later provide support to build a phrase generator from a grammar specification. If you want to know more, read the tutorial Parse::Eyapp:::datagenerationtut. The example used produces sequences of assignment statements:

Parse-Eyapp/examples/generator$ ./Generator.pm # result: -710.2 I=(3*-8+7/5); R=2+8*I*4+5*2+I/I
To specify the language we write a yacc-like grammar, but instead of writing the classical lexer, i. e. scanning the input to produce the next token, we write a token generator: Each time our lexical analyzer is called, it checks the list of expected tokens (available via the method YYExpect) and produces - following some probability distribution - one of them. This is the grammar for the calculator:
Parse-Eyapp/examples/generator$ cat -n Generator.eyp 1 # file: Generator.eyp 2 # compile with: eyapp -b '' Generator.eyp 3 # then run: ./Generator.pm 4 %strict 5 %token NUM VARDEF VAR 6 7 %right '=' 8 %left '-' '+' 9 %left '*' '/' 10 %left NEG 11 %right '^' 12 13 %defaultaction { 14 my $parser = shift; 15 16 return join '', @_; 17 } 18 19 %{ 20 use base q{Parse::Eyapp::TokenGen}; 21 use base q{GenSupport}; 22 %} 23 24 %% 25 26 stmts: 27 stmt 28 { # At least one variable is defined now 29 $_[0]->deltaweight(VAR => +1); 30 $_[1]; 31 } 32 | stmts ';' { "\n" } stmt 33 ; 34 35 stmt: 36 VARDEF '=' exp 37 { 38 my $parser = shift; 39 $parser->defined_variable($_[0]); 40 "$_[0]=$_[2]"; 41 } 42 ; 43 exp: 44 NUM 45 | VAR 46 | exp '+' exp 47 | exp '-' exp 48 | exp '*' exp 49 | exp '/' exp 50 | '-' { $_[0]->pushdeltaweight('-' => -1) } 51 exp %prec NEG { 52 $_[0]->popweight(); 53 "-$_[3]" 54 } 55 | exp '^' exp 56 | '(' { $_[0]->pushdeltaweight( '(' => -1, ')' => +1, '+' => +1, ); } 57 exp 58 ')' 59 { 60 $_[0]->popweight; 61 "($_[3])" 62 } 63 ; 64 65 %% 66 67 unless (caller) { 68 __PACKAGE__->main(@ARGV); 69 }
The difficult part is the management of the probability distribution to produce reasonable phrases and to avoid very long statements. The generation of tokens and its attributes uses Test::LectroTest::Generator. The support subroutines have been isolated in the module GenSupport.pm (see http://cpansearch.perl.org/src/CASIANO/Parse-Eyapp-1.137/examples/generator/GenSupport.pm ).


In reply to Re: Natural Language Sentence Production by casiano
in thread Natural Language Sentence Production by japhy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.