Dwood has asked for the wisdom of the Perl Monks concerning the following question:

Data::SExpression

I was writing a compiler/parser/interpreter with perl but then I realized the way I'm doing it is almost completely wrong and bad (not tokenizing properly!)

Then someone told me that I should be using Data::SExpression because the language syntax is almost exact to LISP's (shown in the readmore. :) ) With the exception of comments (signified by the semicolon: ;)

My question is, how do I utilize SExpression for what I need to do?

I'm doing some tests right now with it but I'm still fairly confused, and would really appreciate a tutorial on it... My current code:

use warnings; my %hash_glob; my %hash_scrip; my @value_types = qw(boolean real short long string trigger_volume cut +scene_flag cutscene_camera_point cutscene_title cutscene_recording de +vice_group ai ai_command_list starting_profile conversation hud_message object_list sound effect damage looping_soun +d animation_graph actor_variant damage_effect object_definition game_difficulty team ai_default_state actor_type hud +_corner object unit vehicle weapon device scenery object_name unit_name vehicle_name weapon_name d +evice_name scenery_name); my @script_types = qw(startup dormant continuous static stub); my $glob_count = 0; my $pre_decl = " "; sub empty { my $FILTER = 'a-zA-Z0-9'; my $string = shift (@_); my $test = $string; $test =~ s/[^$FILTER]//go; if($test){ print "\n"; return 0; } else { print "\nlast line is empty!\n"; return 1; } } sub parenth_count { my $string = shift (@_); my $l_count = () = ($string =~ /\(/g); my $r_count = () = ($string =~ /\)/g); if ($l_count != $r_count) { print "There is no balance on this line ($l_count vs $ +r_count)! Did you format it correctly??? \n"; return 0; } else { print "There is balance, but is the logic correct?\n"; return 1; } } sub remove_all_but { my @tempArray = split( " ", shift (@_)); my $i = shift (@_); if (exists $tempArray[$i]) { my $string = $tempArray[$i]; print "\n $string \n"; return $string; } else { return " ";} } sub check_value_assigned { my $string = shift (@_); if ($string eq " "){ print "\n There is no value assigned!!!!\n"; return "undef"; } else { print "$string\n"; return $string; } } sub get_type_nohash { my $string = shift (@_); $string = remove_all_but($string, 0); print "Getting type!\n"; $string = check_value_assigned ($string); return $string; } sub get_value_type_nohash { my $string = shift(@_); $string = remove_all_but($string, 1); print "Getting value type!\n"; $string = check_value_assigned ($string); return $string; } sub get_name_nohash { my $string = shift (@_); $string = remove_all_but ($string, 2); print "Getting Name!\n"; $string = check_value_assigned ($string); return $string; } sub get_val_nohash { my $string = shift (@_); $string = remove_all_but ($string, 3); print "Getting value! \n"; $string = check_value_assigned ($string); return $string; } sub global_check { my $string = shift (@_); my $indices = index ($string, "global"); my $declarator = substr $string, $indices, 7; if ( $pre_decl ne $declarator ){ print "Global found!\n"; $pre_decl = $declarator; return 1; } else { print "No global found on this line!"; return 0; } } sub global_operations { my $string = shift (@_); my $name = get_name_nohash($string); if ((exists $hash_glob->{$name}) == 0) { print "\nMade it to the globs!\n"; $hash_glob -> { $name} = { NAME => "e\n", TYPE => "BOOLEAN", VALUE => "value" }; $hash_glob->{$name}{NAME}; } else { print "The global $name already exists!\n" ; } } sub basic_operations { my $testString = shift (@_); if (empty ($testString) == 0) { print "There is stuff in string!\n"; if (parenth_count($testString) == 1) { if (global_check($testString) == 1){ print "Global check made it through alive!\n"; global_operations($testString); } } } } my $placeholder = " "; my $baseString = "(global boolean dwood_is_leet true)"; basic_operations($baseString);


An example of a script I should be able to parse/interpret/compile:

(global short rounds_completed 0) (script static unit player (unit (list_get (players) 0))) (script static void music_ambercladremix (sound_looping_start "sound\moooseguy\sound_looping\ambercladremi +x" none 1) ) (script dormant death_cutscene (enable_hud_help_flash false) (show_hud_help_text false) (sleep 30) (camera_set death7 100) (sleep 50) (camera_set death8 110) (cinematic_set_title gameover) (sleep 150) (fade_out 0 0 0 30) (sleep 50) (cinematic_stop) (wake credits_cutscene) )

Replies are listed 'Best First'.
Re: Parsing SExpressions (lisp)
by GrandFather (Saint) on Nov 21, 2010 at 01:01 UTC

    Given your sample input what result do you expect? If the result is too complicated to to express perhaps you should start with a simpler example?

    True laziness is hard work
      You are correct, I do need to start with a simpler example-

      I should be able to get a global's type and put it into the form:

      $hash_glob {$name} = { NAME => $name, TYPE => $type, VALUE => $value } ;


      I should be able to take a script and break it into the form:

      $hash_scripts {$name} = { NAME => $name, TYPE => $type, #continuous, static, dormant, stub, startup are only valid 'types' RETTYPE => $rettype #all of the possible value types, located in array: @return_value_type +s };
      I also want to count number of "nodes" or the amount of characters the script has, as well as where it starts and ends. I don't really know if there's anything else that I need to include...
Re: Parsing SExpressions (lisp)
by ELISHEVA (Prior) on Nov 21, 2010 at 08:09 UTC

    You could use Data::SExpressions and learning Lisp lingo is definitely worth the effort if you plan on doing a lot of parsing. However, I think your first intuitions a while back about using classes was probably right. Here's a fairly simple parser, (in under 300 lines), that should do the basic extraction of commands and binding them to classes that can execute the command, tell you what line number it was found on, how deeply nested it is. And of course you could walk the nested commands do further analysis

    I've done a "fill in the blank here" example below. The code works but does pointlss things when it executes the script (basically just reports on the command and its arguments). Please feel free to ask questions. The point is to understand it and be able to adapt it do your needs, but using recursion and classes is easier to show than talk about so here it is:

    Update: added content to __DATA__ section and fixed line reading if ($sAtom) to if (defined($sAtom))

      Been chomping away at your code (thanks for it, I appreciate you helping me so much!), and am not sure of what approach to take with scripts, perhaps I should add a marker that tells us what script a command belongs to ..? ie a script is a new 'function' and can be called at any point.

      I'm not sure, going to keep working at it to be sure I understand entirely what goes on in the parsing/parser before I make any major adjustments. :) i'll probably (if the command is anything but continuous) add them to userdefined functions.. UPDATERINGER: I have the parser now so that scripts (ie functions) get added to a hash and can therefore be looked up... Time for a whole new package for variables (types of variables have size limitations) then I suppose.
Re: Parsing SExpressions (lisp)
by LanX (Saint) on Nov 21, 2010 at 14:50 UTC
    As I said yesterday at the CB I would rather use regexes to transform an S-expr into a Perl M-expr and eval it. Like that delegating most of the parsing to Perl's interpreter.

    You need to know that LISP syntax treats command code and data code equally as lists!

    So it's very likely that you'll need to "execute" the code sooner or later, just static parsing is not enough.

    As an example, these brute force regexes ...

    my $data=do {local $/;<DATA>}; $data=~s/\(\s*(\w+)(\W)/$1\($2/gs; # s-expr -> m-expr functions $data=~s/\)(?=[\s\w]+)/),/g; # comma after functions $data=~s/\s([\w"\\]+)(?=[\s)])/$1,/g; # comma separated args print $data;

    prints

    global(short,rounds_completed,0,), script(static,unit,player, unit( list_get( players(),0,))), script(static,void,music_ambercladremix, sound_looping_start( "sound\moooseguy\sound_looping\ambercladremix", none,1,), ), script(dormant,death_cutscene, enable_hud_help_flash(false,), show_hud_help_text(false,), sleep(30,), camera_set(death7,100,), sleep(50,), camera_set(death8,110,), cinematic_set_title(gameover,), sleep(150,), fade_out(0,0,0,30,), sleep(50,), cinematic_stop(), wake(credits_cutscene,), ),

    Now simply defining these routines as perl subs in a dedicated package should be sufficient to eval the code, much like a DSL (domain specific language).

    Be aware that additional leading underscores _ and \U for uppercasing in the substitution would avoid any possible conflicts with Perl's built-in function names.

    And you should read the perldocs for AUTOLOAD about catching and treating undefined function names. Barewords in LISP are much like in Perl internally just functions.

    UPDATE
    I think I was wrong about the latter, Perl wouldn't always catch barewords like death7 in AUTOLOAD, so better append () or prepend & to make it clear.
    &death7; death6(); sub AUTOLOAD { print "undefined sub $AUTOLOAD was called\n"; }

    Cheers Rolf