Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello there
I am working in a part-of-speech "dialog" program that has to retrieve a specific value according with the recomposition of its key. My problem is that I have some many transition tables whose values are keys in other tables and so on, that I was wondering that maybe I am not structuring the solution right, Could you please give some advice on this?

Two original file; the first one has a series of lines with the dialog (questions/or answers) already tagged: What/WP is/VBZ her name/NP?. the second file has a series of complementary information: Maybe, Sure, Ohh, Oops, etc associate with each dialog in the first file

I extracted the tags from the original file, creating unique composed keys WP-VBZ-NP which I will match with the keys I have in a hash table (1), this hash table will retrieve a value X1, which is the "key" in another hash of arrays (2) that has as value an array {y1, y2, y3} of posibles answer/or question for that dialog. The values of the array are keys in another hash table (3), where the value are the recomposed key for the specific dialog in the first (original file): y1 ->(CI)-NP-VBZ

I hope this is no getting too long and complicated, well I am going crazy.

To finish, I take the value I got in the last table (3): (CI)-NP-VBZ (recomposed key), and going to the original file (array/or hash?) with the pair tag/value, and extract the values in the order the "recomposed key" suggest and also concatenate de value of the second file.

At the end the dialog goes like that:

What is her name?
Oops, her name is..
You don't know her name?!!
I don't know, maybe is..


So my question, the structure solution is ok? the number of transitions tables(hash or arrays) is ok? and the last one how do I extract the value for especific tag according with the recomposed key? Should I put each line (original tagged file) with the pairs tag/value in an array or another hash key->tag value->value? Thanks in advances for your patience and any help!!!
  • Comment on Checking Composed unique keys using hash tables

Replies are listed 'Best First'.
Re: Checking Composed unique keys using hash tables
by blssu (Pilgrim) on Sep 20, 2002 at 20:37 UTC

    It sounds like you know what algorithms you want to use, but you're fighting the infrastructure -- sometimes very simple ideas end up surrounded by huge amounts of I/O and user interface code.

    During development, it helps to forget about I/O and just use something like Data::Dumper or Storable. Those modules can save whatever data structures you use -- including nested and recursive hashes. You don't have to fight the file I/O until you've solved your application problems. Your data structures remain fluid.

    In this case, I'd suggest you keep the annotations in separate data structures from the original input. For example, store each input sentence as an array of words. Use a second array of hashes to store the annotations. Or use multiple arrays (one for each attribute) if you're uncomfortable with hash references. Generic word dictionaries should be indexed by word. Specific input sentences can be indexed by word offset within the sentence. If you change the sentences, you might find splice() useful. Just remember to also splice the corresponding annotation arrays so that the word offsets remain correct.

    BTW, you might want to check out some other Eliza programs in Perl: merlyn's IRC Eliza article and ChatBot::Eliza

Re: Checking Composed unique keys using hash tables
by tretin (Friar) on Sep 20, 2002 at 18:23 UTC
    I think if you posted some code, I could understand your problem a bit better...

    Oh, and you should take a few minutes to register a PM account too.


    work it harder make it better do it faster makes us stronger more than ever hour after our work is never over.
Re: Checking Composed unique keys using hash tables
by mattr (Curate) on Sep 22, 2002 at 06:30 UTC
    Okay, I figured out what you are doing, though the abbreviations are beyond me (what does VBZ stand for, I give up! :) It took a while to realize the your slashes are not delimiters, so "What" is the WP, "is" is the VBZ, and "her name" is the NP, and "(CI)" must be the exclamation I guess.

    You might want to first analyze the input sentence as a pattern, and then keep a set of rules as to what to do when you have a certain pattern. The logic is the hard part but you can't ignore it. You might also start with more complex patterns and work towards simpler ones, so that if you have a really wierd one you can answer with the least common denominator, like "Huh?". What happens if I type "Do you know her name?" or "Her name was.. I forget?" or "Tell me the girl's name please."

    You could use a regular expression to scan for supported nouns like "if $s =~ /name/ {...}" or you can analyze the sentence into a query which has a subject, verb, and object, then just look up the object and invert. The problem with your current setup is that you never really do any analysis, so you keep getting more and more complex keys. For example, it seems to me you should not have "her" included in part of a key. Do you intend to make a new key for every pronoun in the book? With this kind of a series of exploding ramifications, you can't win. Every time you want to take a new step your data structure will grow exponentially. That is the problem you sense at the end of your post ("how do I extract..").

    One way you could simplify is to analyze the first line of your dialog as a "What-is-X" pattern: if $s =~ /what is (.+)$/i and you can divide $x into "her" and "name" or something more general if you like. You can store patterns in an array or hash and evaluate them in a loop. As long as your program actually understands what is being requested ("name") there is no need to go through another set of hashes. It would save time if you could split up into different hashes instead of composing, since you would have a factor fewer keys to make. Fill in the space with program logic instead..

    Anyway, whatever strategy you use, if you provide some kind of reducing logic step you will be able to keep your data from exploding. Hashes can be really sexy but you need to put some smartness into them or they will just go wild all over you. (Hmm, I best stop here. :)