Some time ago, I came to the monastery gates and received some much needed help in writing a list based mini language.
Ikegami was very kind in helping me out (references
A simple list oriented language in Perl and
Vector math with recdescent), I’ve made a few simple changes that make this grammar behave the way I need. Basically you can do assignment and evaluation.
Evaluation interpolates and autoflattens lists and variables using dot products (default composition and *) cross products (**) duplication (%) and replication (%%), and spits out interpolations that are new line delimited.
Here is an example input:
hello
hello{1,2}
{a,b}{1,2}
{a,b}**{1,2}
temp->somestring
someotherstring->{a,b}$temp$()
$someotherstring$
{a,b}%{1,2}
{a,b}%%{1,2}
and the output:
hello
hello1
hello2
a1
b2
a1
a2
b1
b2
asomestring()
bsomestring()
a
b
a
b
a
a
b
b
Some of the design choices that were made relate to the application that I am doing using the SMILES language. FWIW I can define sets of reactions (and combinatorial libraries) nicely by specifying them in this language. Here is an example of a set of reactions
amines->{CN(C),{C,c}N(H)}
thiols->{{C,c}S}
alcohols->{cO}
sulfonamides->{{C,c}S(=O)(=O)N(H)}
nSynthons->{$amines$,$thiols$,$alcohols$,$sulfonamides$}
lGroups->{Cl,Br,I}
eSynthons->{C(=O){C,c}}
reactants->{ $nSynthons$(H).}**{$lGroups$**$eSynthons$ }
products->{ $nSynthons$**{$eSynthons$%$lGroups$} }
$reactants$>>$products$
and the output:
CN(C)(H).ClC(=O)C>>CN(C)C(=O)C
CN(C)(H).ClC(=O)c>>CN(C)C(=O)c
CN(C)(H).BrC(=O)C>>CN(C)C(=O)C
CN(C)(H).BrC(=O)c>>CN(C)C(=O)c
CN(C)(H).IC(=O)C>>CN(C)C(=O)C
CN(C)(H).IC(=O)c>>CN(C)C(=O)c
CN(H)(H).ClC(=O)C>>CN(H)C(=O)C
CN(H)(H).ClC(=O)c>>CN(H)C(=O)c
CN(H)(H).BrC(=O)C>>CN(H)C(=O)C
CN(H)(H).BrC(=O)c>>CN(H)C(=O)c
CN(H)(H).IC(=O)C>>CN(H)C(=O)C
CN(H)(H).IC(=O)c>>CN(H)C(=O)c
cN(H)(H).ClC(=O)C>>cN(H)C(=O)C
cN(H)(H).ClC(=O)c>>cN(H)C(=O)c
cN(H)(H).BrC(=O)C>>cN(H)C(=O)C
cN(H)(H).BrC(=O)c>>cN(H)C(=O)c
cN(H)(H).IC(=O)C>>cN(H)C(=O)C
cN(H)(H).IC(=O)c>>cN(H)C(=O)c
CS(H).ClC(=O)C>>CSC(=O)C
CS(H).ClC(=O)c>>CSC(=O)c
CS(H).BrC(=O)C>>CSC(=O)C
CS(H).BrC(=O)c>>CSC(=O)c
CS(H).IC(=O)C>>CSC(=O)C
CS(H).IC(=O)c>>CSC(=O)c
cS(H).ClC(=O)C>>cSC(=O)C
cS(H).ClC(=O)c>>cSC(=O)c
cS(H).BrC(=O)C>>cSC(=O)C
cS(H).BrC(=O)c>>cSC(=O)c
cS(H).IC(=O)C>>cSC(=O)C
cS(H).IC(=O)c>>cSC(=O)c
cO(H).ClC(=O)C>>cOC(=O)C
cO(H).ClC(=O)c>>cOC(=O)c
cO(H).BrC(=O)C>>cOC(=O)C
cO(H).BrC(=O)c>>cOC(=O)c
cO(H).IC(=O)C>>cOC(=O)C
cO(H).IC(=O)c>>cOC(=O)c
CS(=O)(=O)N(H)(H).ClC(=O)C>>CS(=O)(=O)N(H)C(=O)C
CS(=O)(=O)N(H)(H).ClC(=O)c>>CS(=O)(=O)N(H)C(=O)c
CS(=O)(=O)N(H)(H).BrC(=O)C>>CS(=O)(=O)N(H)C(=O)C
CS(=O)(=O)N(H)(H).BrC(=O)c>>CS(=O)(=O)N(H)C(=O)c
CS(=O)(=O)N(H)(H).IC(=O)C>>CS(=O)(=O)N(H)C(=O)C
CS(=O)(=O)N(H)(H).IC(=O)c>>CS(=O)(=O)N(H)C(=O)c
cS(=O)(=O)N(H)(H).ClC(=O)C>>cS(=O)(=O)N(H)C(=O)C
cS(=O)(=O)N(H)(H).ClC(=O)c>>cS(=O)(=O)N(H)C(=O)c
cS(=O)(=O)N(H)(H).BrC(=O)C>>cS(=O)(=O)N(H)C(=O)C
cS(=O)(=O)N(H)(H).BrC(=O)c>>cS(=O)(=O)N(H)C(=O)c
cS(=O)(=O)N(H)(H).IC(=O)C>>cS(=O)(=O)N(H)C(=O)C
cS(=O)(=O)N(H)(H).IC(=O)c>>cS(=O)(=O)N(H)C(=O)c
This is the recdescent meat:
#!/usr/bin/perl
# make_grammar.pl: Creates Grammar.pm
use strict;
use warnings;
use Parse::RecDescent ();
my $grammar = <<'__END_OF_GRAMMAR__';
{
use strict;
use warnings;
use List::Util qw( sum );
my %variables=();
sub replicate {
my ($l, $r) = @_;
my $lt = $l->[0];
my $rt = $r->[0];
die("Type error\n") if $lt ne 'list';
die("Type error\n") if $rt ne 'list';
my $ld = $l->[1];
my $rd = $r->[1];
my @replicated;
my $value;
foreach (@{$ld})
{
$value=$_;
foreach(0..$#$rd){push @replicated, $value;}
}
return [list => \@replicated];
}
sub duplicate {
my ($l, $r) = @_;
my $lt = $l->[0];
my $rt = $r->[0];
die("Type error\n") if $lt ne 'list';
die("Type error\n") if $rt ne 'list';
my $ld = $l->[1];
my $rd = $r->[1];
my @duplicated;
foreach(0..$#$rd){push @duplicated, @{$ld};}
return [list => \@duplicated];
}
sub cross_prod {
my ($l, $r) = @_;
my ($i, $j)=('','');
my $lt = $l->[0];
my $rt = $r->[0];
die("Type error\n") if $lt ne 'list';
die("Type error\n") if $rt ne 'list';
my $ld = $l->[1];
my $rd = $r->[1];
return [ list => [map { $i=$_; map { $j=$_; $ld->[$i].$rd->[
+$j]} 0..$#$rd} 0..$#$ld]];
}
sub dot_prod {
my ($l, $r) = @_;
my $l_is_string = $l->[0] eq 'string';
my $r_is_string = $r->[0] eq 'string';
my $l_is_list = $l->[0] eq 'list';
my $r_is_list = $r->[0] eq 'list';
my ($ld, $rd) = ($l->[1], $r->[1]);
if ($l_is_string && $r_is_string) {
return [ string => $ld.$rd ];
}
if ($l_is_list && $r_is_list) {
die("Size error\n") if @$ld != @$rd;
return [ list => [ map { $ld->[$_].$rd->[$_] } 0..$#$rd ]
+];
}
if ($l_is_string && $r_is_list) {
return [ list => [ map { $ld.$_ } @$rd ] ];
}
if ($l_is_list && $r_is_string) {
return [ list => [ map { $_.$rd } @$ld ] ];
}
}
}
parse: assignment EOF
| expr EOF
{
if($item[1]->[0] eq 'list')
{
map {print $_."\n";} @{$item[1]->[1]};
}
if($item[1]->[0] eq 'string')
{print $item[1]->[1]."\n";}
$return = $item[1];
}
assignment: /\w+/ '->' <commit> expr {$variables{$item[1]}=$item[4]
+;}
expr : term expr_[ $item[1] ]
expr_ : '%%' <commit> term expr_[ replicate($arg[0], $item[3]) ]
|'%' <commit> term expr_[ duplicate($arg[0], $item[3]) ]
|'**' <commit> term expr_[ cross_prod($arg[0], $item[3])
+]
|'*' <commit> term expr_[ dot_prod ($arg[0], $item[3])
+ ]
| term <commit> expr_[ dot_prod ($arg[0], $item[1])
+ ]
| { $arg[0] }
term : '{' <commit> mbody '}'
{
my @to_flatten = @{$item[3]};
my @flattened=();
my ($i, $j)=('','');
foreach (@to_flatten)
{
$i=$_;
if($i->[0] eq 'list')
{
push @flattened, @{$_->[1]};
next;
}
push @flattened, $i->[1];
}
$return = [list => \@flattened];
}
| VARIABLE
| STRING
mbody : <leftop: expr ',' expr>
# Tokens
VARIABLE: '$' <commit> /\w+/ '$' { $return = $variables{$item[3]};
+ }
STRING : /[^\{\}\,\*\$\%]+/ { $return = [ string => $item[1]]; }
EOF : /\Z/
__END_OF_GRAMMAR__
Parse::RecDescent->Precompile($grammar, 'Grammar')
or die("Bad grammar\n");
and what you want to run it
#!/usr/bin/perl
# test.pl
use strict;
use warnings;
use Grammar qw( );
my $parser = Grammar->new();
while(<>)
{
chomp;
my $rv = eval { $parser->parse($_) };
my $e = $@;
if ($e) {
$rv = "$_ = $e";
$rv =~ s/\n\z//;
} elsif (!defined($rv)) {
$rv = "$_ = Bad Expression";
}
}
exit;
Any comments/advice appreciated. Thanks again Ikegami.