in reply to Comparitive Scoring System

I don't know of any modules, but fuzzy logic includes a concept called 'hedging' that may work for you.

Technically, a hedge modifies a fuzzy value. As a rough example, if 'old' means 'more than 50', the hedge 'very' would make 'very old' mean 'more than 80'. Individually, hedges provide a convenient way to model human terms like 'roughly', 'mostly', 'vaguely' and so on.

Hedges tend to have the same effect on any modificand. The hedge 'very' does the same thing to 'very old' that it does to 'very tall', 'very close', 'very green', etc. That uniformity produces an interesting side-effect when you compare hedges to each other.

The relative effect of two hedges tends to remain uniform regardless of modificand. 'Very' is always bigger than 'mostly', regardless of what you modify. That means you can build a set of hedges that form a reliable ordinal scale (one that defines an order, i.e.: this one comes before that one) for any set of terms.

In other words, hedges let you mark off two endpoints of a line -- 'absolutely X' and 'absolutely not X' -- then define a reliable sequence of positions along that line. And once you have a reliable sequence, you can sort things.

Now for the sake of vocabulary, let's call that line an axis. If we use more than one value for X, we create a space with multiple axes. We could make a three-dimensional space with axes for 'big', 'green', and 'loud', for instance. Then we can define objects with 3-dimensional points that represent their position in that space. A regular frog would sit at the low end of all three axes, while a 50' heavy-metal Irish frog would go to the high end of all three. The ocean would score high for 'big' and medium for 'green' and 'loud'.

To select objects that are similar, we simply chop out a region whose center falls at a given point. If you want to be simple, you can make the region an N-dimensional cube.. anything within two ticks of point 'the sea' on any axis, for instance. If you want to be fancy, you can make it an N-dimensional sphere.

Finally, you can hedge the modificand 'closeness' to determine the size of the region. 'very close' would generate a small region, while 'sorta close' would generate a larger one.

Algorithmically, this reduces your problem to range evaluation, which is a known subject. Some databases (postgresql) even have geometric operators that will decide whether a given point falls inside a given region. The big job will be to choose a set of qualities to use as axes, then to assign a position in that space for every project in your table.

Hope that at least gives you some ideas to kick around.

Replies are listed 'Best First'.
Re: Re: Comparitive Scoring System
by mstone (Deacon) on Jan 03, 2002 at 03:30 UTC

    Just remembered something else that might be useful..

    When it comes time to rate projects in terms of qualities X1..Xn, don't ask people to look at a single project and assign numbers for each axis. The numbers you get will vary from person to person, according to context. Most people would say that a 50' frog is big, for example. But those same people would also say that the ocean is big, and that the entire known universe is big. If you ask people rate the 'bigness' of those three items, the numbers you get will vary based on the order in which they're presented.

    OTOH, pretty much everyone will agree that the ocean is bigger than a 50' frog, but smaller than the known universe. In other words, people's perception of relative X-ness tends to be more reliable than their absolute rating.

    So instead of rating projects in isolation, give people two projects and ask them to compare them in terms of qualities X1..Xn. For each pair, you'll get a set of n more/same/less ratings that place the two projects relative to each other. Then once you collect enough ratings, you can distribute items along each axis by sorting them topologically.

    It is possible that the ratings you collect will be inconsistent, especially if you have more than one person ranking projects. One person may say P1 is bigger than P2, another may say P2 is bigger than P3, and a third may say P3 is bigger than P1. Topologically, that's a loop, and will screw you up if you try to convert ratings to absolute positions along the scale.

    You solve that problem by collecting more data, and making the relative values fuzzy. If 60% of people say P1 is bigger than P2, you say that 60% of P1 is bigger than P2, but 40% is smaller. If 30% of people say P2 is bigger than P3, P2 outs P3 by a 30/70 margin. If only 10% of people say that P3 is bigger than P1, it means that 10% of P3 is still bigger than some part of P1:

    P1 ---------- P2 ---------- P3 ----------

    Yes, that's a contrived example.. I'm doing text-graphs, here. ;-) The same idea applies for less typeable values, though.

    If you really want to obsess about it, you can generate a range and distribution function for each item along each axis, which is pretty much the definition of a fuzzy point. Then you can select a region from the space, and rate projects in terms of how much each one falls inside that region.

    Hope I haven't stepped over the line into hopeless incomprehensibility.

      If you really want to obsess about it, you can generate a range and distribution function for each item along each axis, which is pretty much the definition of a fuzzy point. Then you can select a region from the space, and rate projects in terms of how much each one falls inside that region.

      This is where I am at on it. I was hoping something would save me the pain of this process. I can't figure out how to do the region, other than map each zip code to a lat/long then do a circle distance from the new project point.

      Thanks for brain dump

      g_White