Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

More useful "best" and "worst" nodes display

by deprecated (Priest)
on May 11, 2004 at 19:58 UTC ( [id://352570]=monkdiscuss: print w/replies, xml ) Need Help??

Back when I was a kid, we only had five users, and all those users were mere acolytes. And when we posted a new node, the highest rep we could get was five. Now, with zillions of new users at the monastery, the number of votes to be cast in a given day, is obscene. Couple that with the incentive to spend votes to gain XP, and you have entirely (well, almost entirely) invalidated the voting system. At least as regards the "worth" of individual posts.

From time to time, I come back to the monastery to update my information, and to read the best and worst nodes in the recent period. Frequently, I do enjoy reading the worst posts (there is afterall, much humor in them). However, I have found that over time, the "best" nodes are really not so spectacular.

I think I have a solution for this.

It should be possible to derive either the norm at a given time, or the number of users at a given time and extrapolate the number of votes typically castable in a given day per post. If one were to weigh either of those numbers against the actual reputation of a given node, the resultant number would be much more indicative of the value of said node.

Presently, comparing today's nodes to yesterday's nodes is simply futile. The ratios are so different that comparison is worthless. Fortunately, the same could be said of "bad" nodes. I'd love to read some of the worst nodes ranked by node value to possible value.

Patches welcome, I'm sure. Perhaps tilly could do it.

dep

--
Tilly is my hero.

Replies are listed 'Best First'.
Re: More useful "best" and "worst" nodes display
by chromatic (Archbishop) on May 11, 2004 at 20:14 UTC
    If one were to weigh either of those numbers against the actual reputation of a given node, the resultant number would be much more indicative of the value of said node.

    Unfortunately, there's no time limit to when someone can vote on a node. Which $NORM do you use?

      I would think that most nodes get the majority of their votes in the first day to a maybe a week after they are posted when they are on the Newest Nodes or the Front Page. After they roll off those the chance of getting a vote is a lot less.
      If you base it off the $NORM for that day or week (possibly weighted more towards the day it was posted) you should account the bulk of the voting to be done on that node.
      If a node continues to get votes after this time it makes sense to rank it higher (or lower for down votes), since that would indicate (to me at least) a strong staying power or usefulness of that node.
      The $NORM for the day the vote was cast seems a good value. While this will involve quite a lot of calculations, something like:
      my $weighted_norm = 0; foreach (@days) { $weighted_norm += $norm_of_the_day * $amount_of_votes_on_this_day / $ +total_amount_of_votes ; }
      could calculate the weighted norm. I'm afraid this will be too big a burden (CPU-time) to implement.
Re: More useful "best" and "worst" nodes display
by atcroft (Abbot) on May 11, 2004 at 20:19 UTC

    One comment, on the idea. In your fourth paragraph, you said:

    "It should be possible to derive either the norm at a given time, or the number of users at a given time and extrapolate the number of votes typically castable in a given day per post. If one were to weigh either of those numbers against the actual reputation of a given node, the resultant number would be much more indicative of the value of said node."

    Reading that, it occurred to me that (if memory serves) each user can only vote on a node once, although (I believe) they can vote on almost any node present. Therefore, if you are considering the proposed solution, it would seem that you might then derive a percentage for the node as the number of votes received to the number of monks having existed in total. Then, at that point, one could look at those with the highest/lowest such ratios, or even attempt to determine the standard deviation of those values and look at the extremes....

    Interesting idea to toy with, if nothing else.

      derive a percentage for the node as the number of votes received to the number of monks having existed in total
      You'd probably have to check for votes received to the number of monks who logged in that day/week/etc. Theres lots, and lots, and lots, of monks who have never even cast a vote.

      Perhaps it might make more sense to just keep track of the total number of votes cast in a day, and compare that to the node rep? (Isn't that basically how the xp system works anyways?)
Re: More useful "best" and "worst" nodes display
by graff (Chancellor) on May 12, 2004 at 03:15 UTC
    There is one gap in your logic -- one missing piece of information -- that is likely to lead to results that are unexpected, unintended, and/or unsatisfying:

    The relative quantity of votes a node recieves is determined primarily by whether or not its thread appears on the front page (The Monastery Gates). If a node is in a front-paged thread, it gets lots of votes -- potentially hundreds (whatever their polarity); if not, it will rarely show up anywhere in Best Nodes (or Worst Nodes) -- and of course, those two special pages will tend to amplify the nodes that get there, which in turn amplifies the difference between front-page and non-front-page threads.

    Of course, there's a circularity here: people decide to front-page a thread because they think it's really good, and so it gets a lot more votes, which reinforce that view. So maybe what would work best, in addition to (or instead of) the "Best Nodes" page is simply a way to locate the threads that used to be on the front page, but have since been pushed off. The only problem then is to figure out how to find the nodes that deserved to go to the front page, but didn't (just because no one decided to do that).

    Normalizing the vote rankings of nodes according to whether or not their threads are front-paged, as well as other factors mentioned above, could have a positive effect on the perceived "acuity" of the rankings -- but this would be hard to verify (there's no accounting for taste...); it might also have a noticeable negative effect on server performance, and this could probably be proven beyond doubt.

      What's worse, the boost for a node showing up at the Gates is not a crisp quantum effect.

      • If a main node gets front-paged before it gets many votes, it can moderately enhance the boost effect factor.
      • If a main node gets front-paged early in the day, USA time, it can greatly enhance the boost effect factor.
      • The first few follow-up nodes get a lot more votes than the rest of the follow-ups.
      • If a follow-up node is created before the main node gets front-paged, the follow-up node can attract a lot more votes than if the follow-up node is created later.
      These factors are all active, regardless of the actual "quality" of the node involved.

      Quality-related factors are prevalent, too. For example, an RTFM follow-up can win over an RTFM-bait question, or an insightful and informative follow-up can garner points over an RTFM follow-up.

      --
      [ e d @ h a l l e y . c c ]

Re: More useful "best" and "worst" nodes display
by hossman (Prior) on May 12, 2004 at 06:20 UTC

    I've spent a lot of time considering various ways of implimenting a "most poopular" list for various systems that involve user rankings of content. In my opinion there are two good solutions, one of which is much better then the other -- but much more complex. Both work equally well for votes with boolean value (ie: 1), perlmonks style +/- votes (a value of -1 or +1), or scale votes (ie: a value of 1..10).

    Unfortunately both require recalculating the score of every piece of content on a regular basis (even if it has had no new votes).

    • Weighted by Age of Content

      This one is fairly straight forward. You start by picking a unit of time (say: 1 day, or 1 week). For each $item determine the age in that unit: $item->age() and it's total number of votes: $item->totalVotes().

      You also need to pick a weighting function weight() -- it can be any function of $age, the simplest one being a linear function with constant slope...

      sub weight { my $age = shift; return $age / $FACTOR; }
      The overall popularity score of each item becomes:
      $score = $item->totalVotes() * weight($item->age());

    • Weighted by Age of Vote

      This is the complex one. Instead of a applying a weight function to the age of the item, you apply a weight function to the difference between the age of the item, and the age of each vote. Of course, this requires that you have some way to get a list of every vote ever cast for an item: $item->votes(), and the date of each vote so you can calculate the age: ($item->votes())[$i]->age()

      $score = 0; foreach $vote ($item->votes()) { $score += $vote->value() * weight($item->age() - $vote->age()); }

    The first method, with the linear weighting function I mentioned, leans towards promoting either recent items or older items (depending on the value of $FACTOR). The second however can be used to lean towards items which continue to earn votes long after they were orriginally written.

    In both cases, changing the weight function to something that is not so linear can allow you to tune it to perfection -- including adding bias at certain dates when you know there were radical shifts in votig rules (ie: on this date, the number of votes per person was significantly increased, so add more bias to votes cast before that date)

Re: More useful "best" and "worst" nodes display
by davido (Cardinal) on May 12, 2004 at 07:12 UTC
    According to Voting/Experience System, the current value of $NORM, as of 05/12/2004 0007 PDT, is 9.7633. That is, the average node reputation for all nodes created within the past week is 9.7633. That number has been in the nine-ish range for a few weeks at least.

    When I joined the Monastery back in August 2003, the average was somewhere around 11.xxx. So at least in the period of time during which I've been a PerlMonk, the notion of "vote inflation" is not supported by fact. If vote inflation were an issue, one would expect to see $NORM growing over time. Instead, at least in the nine months I've been hanging around here, there has been a slight amount of vote deflation.

    It might be interesting to see a history of $NORM over the entire lifetime of the Monastery. But at least in the recent past, inflation isn't an issue.


    Dave

      I'm new here so I think this interesting. Surely the vote value will always go down over time if the Monastery grows? So far I noticed I have a vote surplus, I used 2 yesterday and 3 today (I'm normally pretty neutral). You're not going to tell me I only get 8 votes _ever_ are you? So where are the surplus votes going? Probably to /dev/null unless corrupt Abbots are trading them behind the herb gardens. If I am typical and there's a vote surplus doesn't $NORM depend only on voter activity? Also whats the relationship between votes cast and reputation? I gave you a vote and you got 10 points. If everybody has 1 vote to give and gave it then $NORM would be 10 exactly right? The 0.24 variance is votes that went down the plughole? So I'm guessing votes are a kind of accumlatable currency of value. Wanna buy some votes, maybe I know a guy who knows a guy who has some spare? :)

      Andy

        I don't think I understood your question, but I want to respond anyway.

        1) Votes thar are not cast evaporate into the ether after twenty-four hours.

        2) Asking where the "surplus" votes go is meaningless; votes have no existence until they are cast.

        3) $NORM depends on both the number of votes cast and the number of posts made. $NORM is defined (I think) as the average reputation of the nodes created in the last week. In other words,

        $NORM =
        (++ votes on nodes < one week old) - (-- votes on nodes < one week old)
        (nodes posted in the last week)

        If the number of votes cast and the number of new nodes created both rise at about the same rate, the $NORM will not change much. On the other hand, the extremes (nodes with the highest and lowest reputations) will tend to become more extreme. This is just the nature of statistics*

        ... Which brings me back to the original point. Though it is true that the highest rated nodes this year will (likely) have a higher reputation than the highest rated nodes of years past, I don't see how it matters. There is not a "highest rated node of all time" only highest rated in the last day, week, month, or year. Any effect of increased voting will most likely wash out over all periods but the last, and even the yearly list should not be affected too much. (In my untutored opinion.)

        -- Fuzzy Frog

        *Which is not to say that I understand it in any deep way.
Re: More useful "best" and "worst" nodes display
by eric256 (Parson) on May 12, 2004 at 01:00 UTC

    Wouldn't a ratio of ++ votes to -- votes give the same sort of indication? I mean then you can see that the node mostly got good votes, mostly got bad votes, got all bad votes, got all good votes, etc. Obviously you would show these as precentages or ratios, not those text meanings. A very interesting idea though.


    ___________
    Eric Hodges
Re: More useful "best" and "worst" nodes display
by artist (Parson) on May 12, 2004 at 02:38 UTC
    The nodes are considered 'best' in terms of its usefulness to 'you' ( in whatever ways) in the present.. Let it be in the past, present or future. So many discoveries of the past are virtually unknown to the 'whole world' at that time, but they are very useful today. Many of today's things are widespreaded (thanks to the digital media now), but useful to very few.

    Best are considered from time to time and beg to differ for one person to another. Sites like Stumbleupon understands this very well.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://352570]
Approved by b10m
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 20:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found