in reply to Developing an Expert System/Intelligent Agent for PM?

Here's some more thoughts on this idea, at least implementation-wise:

First, this can be made sufficiently generic that I'd consider developing this as a module rather than specific for PM.

As for how to do it, there's two possible ways. First is messy: assume that every question has no more than N possible keywords, so that when the question is asked, you extract the N most important ones (importance as determined by someone else). Thus, we can then simply use a N dimensional table, each entry being a weighted list (eg a list of hashes) sorted by importance. When the response satisfies the question, the response gets a bit more weight; when it doesn't, it loses some. While this is 'easy' to do, a list with 1000 keywords (reasonable) , and 3 keywords per question requires 1000^3, or a billion storage bins. Not impossible, but still messy.

A better way, but a bit more of trickier programming, would be to use a tree structure; each node would contain a list of responses that contain at least that node's and every parent up to the root of that node's keywords. The children would be stored as a sorted list, more details later. The initial tree would be simply one root and child nodes, one for each keyword, and each keyword knowing what messages it was in.

When a new question is asked, the keywords are extracted into a sorted list from most important to least important. Starting with the children of the root node (in an order, remember), the first keyword is looked for; if found, we go to that child, and start the process again with the next keyword. If the keyword is not found, we go to the next keyword on the list and try again. If we exhaust the list of keywords from the question, then the responses assoicated with the current node are presented to the user (if at the root node, we simply say "nothing found"). Now if we desend to a node with no children, but still have keywords left from the question, we present the responses for that node, and it's parents in order, and then ask the user if the response was helpful or not. If yes, we take the next keyword from the question list, add it as a new node to the current one, and move the response to the new node's list. If no, we take the next keyword from a sorted list from the response that is not in the question keyword list, and do the same. When a new node is created from an existing one, all other responses of the existing one are evaluated as well and moved as appropriate.

Note that the same keyword can appear many times in the tree, and messages will appear multiple times as well.

The keyword importance is important here -- it should be inversely related to the number of responses that all nodes of that keyword (and their children are associated with. That is, less-used keywords are more important than oft-asked ones, such that their questions will be answered first. This re-evaluating can be done periodically (once a day, for example).

New keywords can easily be added by adding that keyword at a very high importance at the top level, with all responses that that keyword is in added to the node's response list. As time progresses, the keyword and responses will be distributed appropriately.

Adding new responses is a bit trickier. A list of keywords from a new response should be generated, and every branch where a keyword from that list should be followed, placing the message in the lowest possible node.

Obvious storage would be an issue, but a straight-forward SQL database would do the job nicely. The requirements should not be as great as for the N-dimensional array system, since it's not expected that every keyword will be in a question with every other keyword.

Note that this is mimicing the "Guess the Animal" game that has persisted from the start of computer programming, where you use a binary tree to distribut knowledge around.

Most of these are just ideas, and I haven't attempted to put anything to code yet, so I'm just airing them to see if they sound reasonable...


Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
  • Comment on Re: Developing an Expert System/Intelligent Agent for PM?

Replies are listed 'Best First'.
Re: Re: Developing an Expert System/Intelligent Agent for PM?
by jynx (Priest) on Jun 16, 2001 at 03:07 UTC

    Haven't done AI in a couple months, but here are a couple concerns off the top of my head:

    First of all, you're talking tree structure, so now you have a reasonable search space, do you want to prune that search space? Before you start your iterations through the search space for applicable data are there some simple algorithms that will lower the number of calculations that need to take place? Maybe the keyword tree can be pruned after a dozen keywords because that's enough information to find an answer in, say, 95% of the queries. That would stop the algorithm from going off into the deep end with 20+ keywords when it doesn't need to.

    This is just an example prune, but it's my general opinion that an AI shouldn't come back and say "well, i looked for stuff relevant to what you're looking for and found half the universe, so if you'd just care to browse through and tell me what was appropriate". That's one of the main problems with search engines sometimes, too much information.

    The other major concern is how often it gets updated. You suggested once a day. This is fine if the search AI gets used a lot, because we shouldn't waste time re-analyzing everything while other people are searching, but if it only gets used around 20-50 times a day it would probably be better to re-analyze more often so that a query in the morning won't return the same (possibly irrelevent) information from one in the afternoon, which would generally help everyone get information faster.

    The tree structure seems to be the best method off the top of my head, my brain's slightly dead right now from finals or i would actually sit down and try to plan out something else for comparison other than the brute force N-dimensional table approach you already posted. If i get some time later this week i may try, but the tree you suggest seems a good choice for representing the data in manageable means.

    Good idea, seems reasonable, the module would probably be interesting to see whether it gets used on perlmonks or not, but that may just be me :-)

    HTH,
    jynx