Some time ago, I came to the monastery gates and received some much needed help in writing a list based mini language. Ikegami was very kind in helping me out (references A simple list oriented language in Perl and Vector math with recdescent), I’ve made a few simple changes that make this grammar behave the way I need. Basically you can do assignment and evaluation. Evaluation interpolates and autoflattens lists and variables using dot products (default composition and *) cross products (**) duplication (%) and replication (%%), and spits out interpolations that are new line delimited. Here is an example input:
hello hello{1,2} {a,b}{1,2} {a,b}**{1,2} temp->somestring someotherstring->{a,b}$temp$() $someotherstring$ {a,b}%{1,2} {a,b}%%{1,2}
and the output:
hello hello1 hello2 a1 b2 a1 a2 b1 b2 asomestring() bsomestring() a b a b a a b b
Some of the design choices that were made relate to the application that I am doing using the SMILES language. FWIW I can define sets of reactions (and combinatorial libraries) nicely by specifying them in this language. Here is an example of a set of reactions
amines->{CN(C),{C,c}N(H)} thiols->{{C,c}S} alcohols->{cO} sulfonamides->{{C,c}S(=O)(=O)N(H)} nSynthons->{$amines$,$thiols$,$alcohols$,$sulfonamides$} lGroups->{Cl,Br,I} eSynthons->{C(=O){C,c}} reactants->{ $nSynthons$(H).}**{$lGroups$**$eSynthons$ } products->{ $nSynthons$**{$eSynthons$%$lGroups$} } $reactants$>>$products$
and the output:
CN(C)(H).ClC(=O)C>>CN(C)C(=O)C CN(C)(H).ClC(=O)c>>CN(C)C(=O)c CN(C)(H).BrC(=O)C>>CN(C)C(=O)C CN(C)(H).BrC(=O)c>>CN(C)C(=O)c CN(C)(H).IC(=O)C>>CN(C)C(=O)C CN(C)(H).IC(=O)c>>CN(C)C(=O)c CN(H)(H).ClC(=O)C>>CN(H)C(=O)C CN(H)(H).ClC(=O)c>>CN(H)C(=O)c CN(H)(H).BrC(=O)C>>CN(H)C(=O)C CN(H)(H).BrC(=O)c>>CN(H)C(=O)c CN(H)(H).IC(=O)C>>CN(H)C(=O)C CN(H)(H).IC(=O)c>>CN(H)C(=O)c cN(H)(H).ClC(=O)C>>cN(H)C(=O)C cN(H)(H).ClC(=O)c>>cN(H)C(=O)c cN(H)(H).BrC(=O)C>>cN(H)C(=O)C cN(H)(H).BrC(=O)c>>cN(H)C(=O)c cN(H)(H).IC(=O)C>>cN(H)C(=O)C cN(H)(H).IC(=O)c>>cN(H)C(=O)c CS(H).ClC(=O)C>>CSC(=O)C CS(H).ClC(=O)c>>CSC(=O)c CS(H).BrC(=O)C>>CSC(=O)C CS(H).BrC(=O)c>>CSC(=O)c CS(H).IC(=O)C>>CSC(=O)C CS(H).IC(=O)c>>CSC(=O)c cS(H).ClC(=O)C>>cSC(=O)C cS(H).ClC(=O)c>>cSC(=O)c cS(H).BrC(=O)C>>cSC(=O)C cS(H).BrC(=O)c>>cSC(=O)c cS(H).IC(=O)C>>cSC(=O)C cS(H).IC(=O)c>>cSC(=O)c cO(H).ClC(=O)C>>cOC(=O)C cO(H).ClC(=O)c>>cOC(=O)c cO(H).BrC(=O)C>>cOC(=O)C cO(H).BrC(=O)c>>cOC(=O)c cO(H).IC(=O)C>>cOC(=O)C cO(H).IC(=O)c>>cOC(=O)c CS(=O)(=O)N(H)(H).ClC(=O)C>>CS(=O)(=O)N(H)C(=O)C CS(=O)(=O)N(H)(H).ClC(=O)c>>CS(=O)(=O)N(H)C(=O)c CS(=O)(=O)N(H)(H).BrC(=O)C>>CS(=O)(=O)N(H)C(=O)C CS(=O)(=O)N(H)(H).BrC(=O)c>>CS(=O)(=O)N(H)C(=O)c CS(=O)(=O)N(H)(H).IC(=O)C>>CS(=O)(=O)N(H)C(=O)C CS(=O)(=O)N(H)(H).IC(=O)c>>CS(=O)(=O)N(H)C(=O)c cS(=O)(=O)N(H)(H).ClC(=O)C>>cS(=O)(=O)N(H)C(=O)C cS(=O)(=O)N(H)(H).ClC(=O)c>>cS(=O)(=O)N(H)C(=O)c cS(=O)(=O)N(H)(H).BrC(=O)C>>cS(=O)(=O)N(H)C(=O)C cS(=O)(=O)N(H)(H).BrC(=O)c>>cS(=O)(=O)N(H)C(=O)c cS(=O)(=O)N(H)(H).IC(=O)C>>cS(=O)(=O)N(H)C(=O)C cS(=O)(=O)N(H)(H).IC(=O)c>>cS(=O)(=O)N(H)C(=O)c
This is the recdescent meat:
#!/usr/bin/perl # make_grammar.pl: Creates Grammar.pm use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; use List::Util qw( sum ); my %variables=(); sub replicate { my ($l, $r) = @_; my $lt = $l->[0]; my $rt = $r->[0]; die("Type error\n") if $lt ne 'list'; die("Type error\n") if $rt ne 'list'; my $ld = $l->[1]; my $rd = $r->[1]; my @replicated; my $value; foreach (@{$ld}) { $value=$_; foreach(0..$#$rd){push @replicated, $value;} } return [list => \@replicated]; } sub duplicate { my ($l, $r) = @_; my $lt = $l->[0]; my $rt = $r->[0]; die("Type error\n") if $lt ne 'list'; die("Type error\n") if $rt ne 'list'; my $ld = $l->[1]; my $rd = $r->[1]; my @duplicated; foreach(0..$#$rd){push @duplicated, @{$ld};} return [list => \@duplicated]; } sub cross_prod { my ($l, $r) = @_; my ($i, $j)=('',''); my $lt = $l->[0]; my $rt = $r->[0]; die("Type error\n") if $lt ne 'list'; die("Type error\n") if $rt ne 'list'; my $ld = $l->[1]; my $rd = $r->[1]; return [ list => [map { $i=$_; map { $j=$_; $ld->[$i].$rd->[ +$j]} 0..$#$rd} 0..$#$ld]]; } sub dot_prod { my ($l, $r) = @_; my $l_is_string = $l->[0] eq 'string'; my $r_is_string = $r->[0] eq 'string'; my $l_is_list = $l->[0] eq 'list'; my $r_is_list = $r->[0] eq 'list'; my ($ld, $rd) = ($l->[1], $r->[1]); if ($l_is_string && $r_is_string) { return [ string => $ld.$rd ]; } if ($l_is_list && $r_is_list) { die("Size error\n") if @$ld != @$rd; return [ list => [ map { $ld->[$_].$rd->[$_] } 0..$#$rd ] +]; } if ($l_is_string && $r_is_list) { return [ list => [ map { $ld.$_ } @$rd ] ]; } if ($l_is_list && $r_is_string) { return [ list => [ map { $_.$rd } @$ld ] ]; } } } parse: assignment EOF | expr EOF { if($item[1]->[0] eq 'list') { map {print $_."\n";} @{$item[1]->[1]}; } if($item[1]->[0] eq 'string') {print $item[1]->[1]."\n";} $return = $item[1]; } assignment: /\w+/ '->' <commit> expr {$variables{$item[1]}=$item[4] +;} expr : term expr_[ $item[1] ] expr_ : '%%' <commit> term expr_[ replicate($arg[0], $item[3]) ] |'%' <commit> term expr_[ duplicate($arg[0], $item[3]) ] |'**' <commit> term expr_[ cross_prod($arg[0], $item[3]) +] |'*' <commit> term expr_[ dot_prod ($arg[0], $item[3]) + ] | term <commit> expr_[ dot_prod ($arg[0], $item[1]) + ] | { $arg[0] } term : '{' <commit> mbody '}' { my @to_flatten = @{$item[3]}; my @flattened=(); my ($i, $j)=('',''); foreach (@to_flatten) { $i=$_; if($i->[0] eq 'list') { push @flattened, @{$_->[1]}; next; } push @flattened, $i->[1]; } $return = [list => \@flattened]; } | VARIABLE | STRING mbody : <leftop: expr ',' expr> # Tokens VARIABLE: '$' <commit> /\w+/ '$' { $return = $variables{$item[3]}; + } STRING : /[^\{\}\,\*\$\%]+/ { $return = [ string => $item[1]]; } EOF : /\Z/ __END_OF_GRAMMAR__ Parse::RecDescent->Precompile($grammar, 'Grammar') or die("Bad grammar\n");
and what you want to run it
#!/usr/bin/perl # test.pl use strict; use warnings; use Grammar qw( ); my $parser = Grammar->new(); while(<>) { chomp; my $rv = eval { $parser->parse($_) }; my $e = $@; if ($e) { $rv = "$_ = $e"; $rv =~ s/\n\z//; } elsif (!defined($rv)) { $rv = "$_ = Bad Expression"; } } exit;
Any comments/advice appreciated. Thanks again Ikegami.