Thanks, pc that helped a lot. In case anybody's wondering what I'm up to, the eventual goal is to write a DFA builder.

Given a list of variable names and recognized functions (at run time), the DFA that it builds should be able to tokenize mathematical expressions that have been written using implied multiplication. for example, if you have variables called 'si' and 'n', and a function called 'sin(', the expression sin(sin) should be tokenized as if the user had entered sin(si*n). The list of tokens produced might look like the following.

ARG,sin( ARG,si ARG,n

I imagine that there is also probably some way to do this using regular expressions, and it would probably involve using look-ahead characters. I think that I might be able to do it faster with recursion, however, and of course the main reason I'm going back to finite state machines is that I'm fascinated with finite state machines! After a lot of consideration, I decided that the DFA object should really only have it's state information as data.

My original idea was to include the buffer and all that other stuff inside the object itself, and then to have different member functions that knew what to do depending on the current state the machine was in. This was a nice idea, but I'm trying to go for speed here, and when you really think about it, passing a bunch of object references around, and then using the $self->{buffer}, $self->{output} notation all the time is bound to have added overhead. So basically, since there really is only two things the machine can do (collect an input character in the buffer, or tokenize what's already in the buffer), I "hard-coded" those two actions into the main loop of the lex function.

If anyone's interested, here is some toy code that can only recognize the tokens 'si' and 'sin('

use strict; use warnings; package DFA; my $dfa = { start =>{nextstate =>{s =>'var1_1orfun1_1'}, entrytoken=>'OTHER'}, var1_1orfun1_1=>{nextstate =>{i =>'var1_2orfun1_2'}}, var1_2orfun1_2=>{exittoken =>{n =>'', default=>'ARG'}, nextstate =>{n =>'fun1_3', s =>'var1_1orfun1_1'}}, fun1_3 =>{nextstate =>{'(' =>'fun1_4'}}, fun1_4 =>{exittoken =>{default=>'ARG'}, nextstate =>{s =>'var1_1orfun1_1'}} }; bless($dfa,"DFA"); sub lex { my $self = shift; my @input = split('',shift); my @output = (); my $buffer = ''; my $currentstate = $self->{start}; for my $c (@input) { #exit action if( my $a = $currentstate->{exittoken} ) { if( my $tt = defined($a->{$c})?$a->{$c}:$a->{default} ) { push(@output,{type=>$tt,value=>$buffer}); $buffer = ''; } }#end exit action if #state transition my $s = $currentstate->{nextstate}->{$c} || 'start'; $currentstate = $self->{$s}; $buffer = $buffer.$c; #entry action if( my $tt = $currentstate->{entrytoken} ) { push(@output,{type=>$tt,value=>$buffer}); $buffer = ''; }#end entry action if }#end for loop #eof exit action if($buffer) { my $a = $currentstate->{exittoken}; my $tt = ($a)?$a->{default}:'OTHER'; push(@output,{type=>$tt,value=>$buffer}); } return @output; }#end function lex my $inputstring = 'sin('; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); } $inputstring = 'sisin(si'; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); } $inputstring = 'sin(sisin('; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); }

Output:

an input of sin( produced the following output. ARG,sin( an input of sisin(si produced the following output. ARG,si ARG,sin( ARG,si an input of sin(sisin( produced the following output. ARG,sin( ARG,si ARG,sin(

I used the logical or (as suggested here) to help deal with default cases and I also tried to use the // operator (as suggested here) to rewrite the expression  my $tt = defined($a->{$c})?$a->{$c}:$a->{default}, but the version of Perl I'm using is too early.

Thanks again for everyone's help! I feel like I'm making some progress now.


In reply to Re^2: Help with Variable Scope by tford
in thread Help with Variable Scope by tford

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.