Re^2: More PM stats analysis on new levels

Replies are listed 'Best First'.
Re^3: More PM stats analysis on new levels by demerphq (Chancellor) on Dec 05, 2005 at 13:05 UTC
What do you mean "zombie" initiates? Inactive? Sorry, I should have been more clear. Zombies are users that never posted, never voted, and never really used their accounts. I think we could look into providing you a batch of more specifc data. Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats. If you can suggest forms of the info that would be sufficiently useful to you but sufficiently anonymous that I can give them to you Id be happy to do so. --- $world=~s/war/peace/g	[reply]
Re^4: More PM stats analysis on new levels by xdg (Monsignor) on Dec 05, 2005 at 14:05 UTC
The data set I'd love to get is the number of nodes and sum of node reputations for initial posts and replies in each category of Perlmonks. If I had that by user, plus user XP and maybe even date user joined, that would be a fantastic data set. The reason that "by user" helps is that it easily allows clearing out outliers like the nodereaper and zombies. For anonymity, the data set doesn't even need to have user name/home-node id -- though that doesn't really protect the anonymity of the Saints in our book. If by user (even masked) isn't sufficiently anonymous, then those same stats summarized by monk level would be sufficient, as long as vroom/antivroom/nodereaper/zombie accounts were stripped out first. Does that address the anonymity concern? -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re^5: More PM stats analysis on new levels by demerphq (Chancellor) on Dec 05, 2005 at 17:41 UTC
The only bit I dont get is what you mean by category. Do you mean nodetype? --- $world=~s/war/peace/g	[reply]
Re^6: More PM stats analysis on new levels by xdg (Monsignor) on Dec 05, 2005 at 17:48 UTC
Re^7: More PM stats analysis on new levels by demerphq (Chancellor) on Dec 05, 2005 at 17:57 UTC
Re^4: More PM stats analysis on new levels by QM (Parson) on Dec 05, 2005 at 14:45 UTC
Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats. How about adding random noise to the XP of each post? Use some rather large uniform distribution (say +/-100?), but don't report the size of the distribution. As long as the mean remains relatively unchanged, the stats should too. Or choose a different distribution. This would suffer from rough guesses about the size of the distribution based on the largest negative value, and some of the lowest scoring nodes could be guessed. Another idea is to take nodes in pairs at random, and shuffle their XP up a little. If two nodes have 17 and 48 XP, change them randomly by +/-5, so that the sum is still the same. Do this randomly across many pairs (not necessarily all), such that most nodes have changed only slightly. Then each slice of the XP distribution should be stable, and guessing XP is much harder for low scoring nodes. If xdg is going to use post order, or distinguish between different "grades" of XP, then the distribution must be chosen more carefully. After all, a Max or Min XP stat would be meaningless, and a plot of XP by post order, or XP by calendar date might be bogus. Update: You can only give this out a few times. After the 5th or 10th set, a node's average XP tends to settle down. Unless you can come up with wildly differing distributions every time. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re^5: More PM stats analysis on new levels by xdg (Monsignor) on Dec 05, 2005 at 15:26 UTC
How about adding random noise to the XP of each post Careful with terminology here. Users have XP. Posts have reputation. I really wouldn't need per-post reputation for what I was thinking of doing if I can get the aggregate statistics I mentioned. -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re^3: More PM stats analysis on new levels (large query result) by demerphq (Chancellor) on Jan 14, 2006 at 17:42 UTC
Well I put together the following query for you. I don't think its exactly what you had in mind, but its more than nothing. Its a breakdown of posts by type by level of author. Of course its by level of author _now_, not when originally posted. It does not include reaped nodes. Read more... (24 kB) And this is the breakdown of the notes by the type of the root node of the thread. Read more... (26 kB) --- $world=~s/war/peace/g	[reply]
Re^3: More PM stats analysis on new levels (large query result) by demerphq (Chancellor) on Jan 14, 2006 at 19:04 UTC
I also put this one together for you. Its a breakdown of posts by type, level of poster and (bucketized) node reputation. select t.title typetitle, lb.level, CEIL(n.reputation/10)10 noderep, count(n.node_id) nodecount from node n, node a, user u, node t, level_buckets lb where n.author_user = a.node_id and n.type_nodetype = t.node_id and a.node_id = u.user_id and CEIL(u.experience/10)10 = lb.experience and n.author_user != 52855 and n.type_nodetype in (31670, 1042, 31663, 1036, 11, 935, 1588, 173295, 121, 120, 23614, 23615, 115, 956, 389544, 1584, 337433, 1440, 7487, 7488, 1980, 1981, 1748, 1749) group by t.title, lb.level, noderep order by t.title, lb.level, noderep Read more... (83 kB)	[reply] [d/l]