It sounds like you know what algorithms you want to use, but you're fighting the infrastructure -- sometimes very simple ideas end up surrounded by huge amounts of I/O and user interface code.
During development, it helps to forget about I/O and just use something like Data::Dumper or Storable. Those modules can save whatever data structures you use -- including nested and recursive hashes. You don't have to fight the file I/O until you've solved your application problems. Your data structures remain fluid.
In this case, I'd suggest you keep the annotations in separate data structures from the original input. For example, store each input sentence as an array of words. Use a second array of hashes to store the annotations. Or use multiple arrays (one for each attribute) if you're uncomfortable with hash references. Generic word dictionaries should be indexed by word. Specific input sentences can be indexed by word offset within the sentence. If you change the sentences, you might find splice() useful. Just remember to also splice the corresponding annotation arrays so that the word offsets remain correct.
BTW, you might want to check out some other Eliza programs in Perl:
merlyn's IRC Eliza article and
ChatBot::Eliza
| [reply] [d/l] |
I think if you posted some code, I could understand your problem a bit better...
Oh, and you should take a few minutes to register a PM account too.
work it harder make it better do it faster makes us stronger more than ever hour after our work is never over. | [reply] |
Okay, I figured out what you are doing, though the abbreviations are beyond me (what does VBZ stand for, I give up! :) It took a while to realize the your slashes are not delimiters, so "What" is the WP, "is" is the VBZ, and "her name" is the NP, and "(CI)" must be the exclamation I guess.
You might want to first analyze the input sentence as a pattern, and then keep a set of rules as to what to do when you have a certain pattern. The logic is the hard part but you can't ignore it. You might also start with more complex patterns and work towards simpler ones, so that if you have a really wierd one you can answer with the least common denominator, like "Huh?". What happens if I type "Do you know her name?" or "Her name was.. I forget?" or "Tell me the girl's name please."
You could use a regular expression to scan for supported nouns like "if $s =~ /name/ {...}" or you can analyze the sentence into a query which has a subject, verb, and object, then just look up the object and invert. The problem with your current setup is that you never really do any analysis, so you keep getting more and more complex keys. For example, it seems to me you should not have "her" included in part of a key. Do you intend to make a new key for every pronoun in the book? With this kind of a series of exploding ramifications, you can't win. Every time you want to take a new step your data structure will grow exponentially. That is the problem you sense at the end of your post ("how do I extract..").
One way you could simplify is to analyze the first line of your dialog as a "What-is-X" pattern: if $s =~ /what is (.+)$/i and you can divide $x into "her" and "name" or something more general if you like.
You can store patterns in an array or hash and evaluate them in a loop. As long as your program actually understands what is being requested ("name") there is no need to go through another set of hashes. It would save time if you could split up into different hashes instead of composing, since you would have a factor fewer keys to make. Fill in the space with program logic instead..
Anyway, whatever strategy you use, if you provide some kind of reducing logic step you will be able to keep your data from exploding. Hashes can be really sexy but you need to put some smartness into them or they will just go wild all over you. (Hmm, I best stop here. :) | [reply] [d/l] |