For a complete answer, find a good book about compilers and look
up subset construction. The Dragon Book (aka:
Compilers: Principles, Techniques and Tools) by Aho, Sethi, &
Ullman is the canonical reference.
In less formal terms, you track the states of an NFA by combining
all the possible transitions each step of the way. If input
character 'a' can take you to states B, C, and D, you then check B,
C, and D to see if they have transitions for the next input
character. In effect, the set {B,C,D} of nondeterministic states
is itself a state in a deterministic finite automaton (DFA) that
recognizes the same string.
Processing that ever-growing list of states that might have
transitions is nontrivial work. The traditional solution uses two
stacks: stack one contains all the states you want to check for
transitions, and whenever you find a valid transition you push the
result state onto stack two. When stack one is finally empty, you
advance to the next input character, swap stack one with stack two,
and start over again.
The process becomes even more fun when your NFA contains what are
called epsilon transitions.. 'automatic' transitions that
link two states without requiring any input. Epsilon transitions
make it easier to model things like Kleene closures (the splat
operator ('*') in a regular expression) in ways that are
(comparatively) easy to read:
a DFA that recognizes the string 'a':
(s0)- a ->(s1)
an NFA with epsilon transitions that recognizes
the string 'a*':
+-->(s1)- a ->(s2)- e -+
| |
+-->(s0)- e -+----------------------+-->(s3)- e -+
| |
+------------------------------------------------+
To find the set of states that might possibly accept the next
input character, you have to follow all the epsilon transitions to
states you can reach without having to accept any input. Then you
have to follow any e-transitions those states might have, etcetera.
In the diagram above, for instance, you have to follow three
e-transitions before you can recognize the second 'a': one from s2
to s3, a second from s3 to s0, and a third from s0 to s1. The
result is known as the e-closure of the original state. You
also have to watch out for e-transitions that will take you in a
loop.. from s0 to s3, and back to s0, for instance.
During subset construction, you have to calculate the e-closure
of a whole set of NFA states. You can do that as soon as you find
a valid transition (if the transition is valid, push the e-closure
of the result state onto stack two), or you can do it before looking
for transitions (pop the next NFA state off stack one, find the
e-closure, and check all those states for transitions).
Either way, it takes some fiddling around.
Both the process of building an e-closure and the process of
searching for transitions break down nicely for a divide-and-conquer
strategy, though. If a state is smart enough to find its own
e-transitions (fairly trivial), it can then ask all its e-neighbors
to report their own e-transitions. The same is true for checking
transitions. Therefore, it makes sense to turn the states into
objects that are smart enough to handle their own chunk of the
calculation.
|