Re: Re: finite automata

I may, of course, be misunderstanding the problem, but it sounds to me like you're not given a complete listing of the language dictionary - simply a set of rules that valid words must obey. In that case pjf's regex-based solution is far more efficient (assuming, of course, that you can represent each of the rules as a regex).

--
<http://www.dave.org.uk>

"The first rule of Perl club is you don't talk about Perl club."

Comment on Re: Re: finite automata

Replies are listed 'Best First'.
Re: Re: Re: finite automata by doc (Scribe) on Oct 02, 2001 at 22:36 UTC
As you note it does rather depend on what the problem is. Depending on the situation and the complexity of the language a pre-generated hash lookup table will be potentially much faster than a regex solution. Consider a simple alphabet that may only contain words in the form: aa ab ac ad .... az. Including the overhead of generating the hash lookup table the hash method is much faster than a comparable regex method as well as being far more flexible. use Benchmark; $string = 'aa ab ac ad ae af ' x 10000 . ' ff'; @string = split /\s/, $string; $regex = <<'CODE'; &regex; sub regex { do{return 0 unless /^a[a-z]$/} for @string; return 1; } CODE $hash = <<'CODE'; $hash{$_}++ for aa..az; &hash; sub hash { do{return 0 unless defined $hash{$_}} for @string; return 1; } CODE timethese ( 100, { 'regex' => $regex, 'hash' => $hash } ); __END__ Benchmark: timing 100 iterations of hash, regex... hash: 21 wallclock secs (20.37 usr + 0.00 sys = 20.37 CPU) @ 4 +.91/s (n=100) regex: 45 wallclock secs (45.10 usr + 0.00 sys = 45.10 CPU) @ 2 +.22/s (n=100) [download] doc print(s??cod??scalar reverse :p)	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Re: finite automata
by doc (Scribe) on Oct 02, 2001 at 22:36 UTC

As you note it does rather depend on what the problem is. Depending on the situation and the complexity of the language a pre-generated hash lookup table will be potentially much faster than a regex solution.

Consider a simple alphabet that may only contain words in the form: aa ab ac ad .... az. Including the overhead of generating the hash lookup table the hash method is much faster than a comparable regex method as well as being far more flexible.

use Benchmark;

$string = 'aa ab ac ad ae af ' x 10000 . ' ff';
@string = split /\s/, $string;

$regex = <<'CODE';
&regex;
sub regex {
    do{return 0 unless /^a[a-z]$/} for @string;
    return 1;
}
CODE

$hash = <<'CODE';
$hash{$_}++ for aa..az;
&hash;
sub hash {
    do{return 0 unless defined $hash{$_}} for @string;
    return 1;
}
CODE

timethese ( 100, { 'regex' => $regex, 'hash' => $hash } );

__END__

Benchmark: timing 100 iterations of hash, regex...
      hash: 21 wallclock secs (20.37 usr +  0.00 sys = 20.37 CPU) @  4
+.91/s (n=100)
     regex: 45 wallclock secs (45.10 usr +  0.00 sys = 45.10 CPU) @  2
+.22/s (n=100)
[download]

doc

print(s??cod??scalar reverse :p)

[reply]
[d/l]