note
benizi
<p>The "right" way to do this is to use [wp://Finite State Transducer|Finite State Transducers]. They're used quite a bit in morphological analysis (deconstructing a word into its morphemes). I enjoyed [isbn://9781575864334|Finite State Morphology], by Lauri Karttunen and Kenneth R. Beesley. A lot of the material you'll find will be very academic, and the field is a bit Finnish-heavy (It has far richer morphology than English). But one of the attractive features of the technology is its run-time efficiency. There are a couple widely-used toolkits: [http://www.fsmbook.com|Xerox Finite State Toolkit], which comes with the book I mentioned above. (might have licensing issues). And [http://people.csail.mit.edu/ilh/fst/|the MIT FST Toolkit].</p>
<p>Some relevant acronyms are WFST, FSA, FSM, and FST for weighted finite state transducers, finite state automata, finite state machines, and finite state transducers. I'm pretty sure Google has a toolkit that's relevant, but I can't seem to find it (I think it uses yet-another acronym for a class of machines that contains WFST's.)</p>
<p>None of these is a Perl solution.</p>
<p><em>Update:</em> fixed Wikipedia link</p>
<p><em>Update 2:</em> The Google Research-related kit is [http://www.openfst.org/|OpenFST].</p>
675520
675520