Re: finite automata

This will get you started:

#!/usr/bin/perl -w
use strict;

my $string = 'just another perl hacker';

# generate a lookup hash containing the words in our language
# as the keys, we set the values to 1 with the ++ syntax
# so as to define the keys which are all we use
my %lang;
do{ chomp; $lang{$_}++ }for <DATA>; 

# split the test string on whitespace to give us an
# array that will contain all the 'words' where a 
# word is a character sequence
my @bits = split /\s/, $string;

# iterate over our word array seeing if they are 
# defined in our langugue specification
for (@bits) {
    die "Word '$_' not in language!\n" unless defined $lang{$_};
}

# if we have not died then all the words are OK
print "Success, \$string only contains words in language!\n";

__DATA__
just
another
finite
automaton
perl
hack
[download]

doc

print(s<>ecode?scalar reverse :p)

Comment on Re: finite automata Download Code

Replies are listed 'Best First'.
Re: Re: finite automata by davorg (Chancellor) on Oct 02, 2001 at 16:10 UTC
I may, of course, be misunderstanding the problem, but it sounds to me like you're not given a complete listing of the language dictionary - simply a set of rules that valid words must obey. In that case pjf's regex-based solution is far more efficient (assuming, of course, that you can represent each of the rules as a regex). -- <http://www.dave.org.uk> "The first rule of Perl club is you don't talk about Perl club."	[reply]
Re: Re: Re: finite automata by doc (Scribe) on Oct 02, 2001 at 22:36 UTC
As you note it does rather depend on what the problem is. Depending on the situation and the complexity of the language a pre-generated hash lookup table will be potentially much faster than a regex solution. Consider a simple alphabet that may only contain words in the form: aa ab ac ad .... az. Including the overhead of generating the hash lookup table the hash method is much faster than a comparable regex method as well as being far more flexible. use Benchmark; $string = 'aa ab ac ad ae af ' x 10000 . ' ff'; @string = split /\s/, $string; $regex = <<'CODE'; &regex; sub regex { do{return 0 unless /^a[a-z]$/} for @string; return 1; } CODE $hash = <<'CODE'; $hash{$_}++ for aa..az; &hash; sub hash { do{return 0 unless defined $hash{$_}} for @string; return 1; } CODE timethese ( 100, { 'regex' => $regex, 'hash' => $hash } ); __END__ Benchmark: timing 100 iterations of hash, regex... hash: 21 wallclock secs (20.37 usr + 0.00 sys = 20.37 CPU) @ 4 +.91/s (n=100) regex: 45 wallclock secs (45.10 usr + 0.00 sys = 45.10 CPU) @ 2 +.22/s (n=100) [download] doc print(s??cod??scalar reverse :p)	[reply] [d/l]