What do you mean "zombie" initiates? Inactive?
Sorry, I should have been more clear. Zombies are users that never posted, never voted, and never really used their accounts.
I think we could look into providing you a batch of more specifc data. Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats. If you can suggest forms of the info that would be sufficiently useful to you but sufficiently anonymous that I can give them to you Id be happy to do so.
---
$world=~s/war/peace/g
| [reply] |
The data set I'd love to get is the number of nodes and sum of node reputations for initial posts and replies in each category of Perlmonks. If I had that by user, plus user XP and maybe even date user joined, that would be a fantastic data set.
The reason that "by user" helps is that it easily allows clearing out outliers like the nodereaper and zombies. For anonymity, the data set doesn't even need to have user name/home-node id -- though that doesn't really protect the anonymity of the Saints in our book. If by user (even masked) isn't sufficiently anonymous, then those same stats summarized by monk level would be sufficient, as long as vroom/antivroom/nodereaper/zombie accounts were stripped out first.
Does that address the anonymity concern?
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] |
| [reply] |
Id have to think a bit on how to present the info so that it doesn't tell you each nodes rep exactly, but does allow you to do your stats.
How about adding random noise to the XP of each post? Use some rather large uniform distribution (say +/-100?), but don't report the size of the distribution. As long as the mean remains relatively unchanged, the stats should too. Or choose a different distribution. This would suffer from rough guesses about the size of the distribution based on the largest negative value, and some of the lowest scoring nodes could be guessed.
Another idea is to take nodes in pairs at random, and shuffle their XP up a little. If two nodes have 17 and 48 XP, change them randomly by +/-5, so that the sum is still the same.
Do this randomly across many pairs (not necessarily all), such that most nodes have changed only slightly. Then each slice of the XP distribution should be stable, and guessing XP is much harder for low scoring nodes.
If xdg is going to use post order, or distinguish between different "grades" of XP, then the distribution must be chosen more carefully. After all, a Max or Min XP stat would be meaningless, and a plot of XP by post order, or XP by calendar date might be bogus.
Update: You can only give this out a few times. After the 5th or 10th set, a node's average XP tends to settle down. Unless you can come up with wildly differing distributions every time.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] |
| [reply] |
Well I put together the following query for you. I don't think its exactly what you had in mind, but its more than nothing. Its a breakdown of posts by type by level of author. Of course its by level of author _now_, not when originally posted. It does not include reaped nodes.
And this is the breakdown of the notes by the type of the root node of the thread.
---
$world=~s/war/peace/g
| [reply] |
I also put this one together for you. Its a breakdown of posts by type, level of poster and (bucketized) node reputation.
select t.title typetitle, lb.level, CEIL(n.reputation/10)*10 noderep, count(n.node_id) nodecount
from node n, node a, user u, node t, level_buckets lb
where n.author_user = a.node_id
and n.type_nodetype = t.node_id
and a.node_id = u.user_id
and CEIL(u.experience/10)*10 = lb.experience
and n.author_user != 52855
and n.type_nodetype in (31670, 1042, 31663, 1036, 11, 935, 1588, 173295, 121, 120, 23614, 23615, 115,
956, 389544, 1584, 337433, 1440, 7487, 7488, 1980, 1981, 1748, 1749)
group by t.title, lb.level, noderep
order by t.title, lb.level, noderep
| [reply] [d/l] |