Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Querying Saints in our Book (2)

by chacham (Prior)
on May 13, 2016 at 14:40 UTC ( [id://1162965]=monkdiscuss: print w/replies, xml ) Need Help??

Extending Querying Saints in our Book, the script now includes Monk Levels, which helps when trying to group or order the data. More importantly, (if it indeed can be important) the query now includes XP Ratio, which is calculated by XP / Writeups.

Fabricated numbers of some monks and monks with no writeups are outliers that mess up the numbers. So, the query returns two sets of ratios, with and without exclusions. Perhaps a more complicated rule for outliers can be devised.

// ==UserScript== // @name Perlmonks Saints in our Book to Oracle // @namespace http://www.example.com/chacham/ // @description Turn the data in Perlmonks Saints in our Book into an +Oracle query with CTEs // @include http://perlmonks.org/?node_id=3559 // @version 1.1.1 // @grant none // ==/UserScript== var query = "WITH\ \n\t-- From: http://perlmonks.org/?node_id=509805\ \n\tMonk_Level(Id, XP_Low, XP_High, Title)\ \nAS\ \n\t(\ \n\t SELECT 13, 003000, 03999, 'Curate' FROM Dual UNION ALL\ \n\t SELECT 14, 004000, 05399, 'Priest' FROM Dual UNION ALL\ \n\t SELECT 15, 005400, 06999, 'Vicar' FROM Dual UNION ALL\ \n\t SELECT 16, 007000, 08999, 'Parson' FROM Dual UNION ALL\ \n\t SELECT 17, 009000, 11999, 'Prior' FROM Dual UNION ALL\ \n\t SELECT 18, 012000, 15999, 'Monsignor' FROM Dual UNION ALL\ \n\t SELECT 19, 016000, 21999, 'Abbot' FROM Dual UNION ALL\ \n\t SELECT 20, 022000, 29999, 'Canon' FROM Dual UNION ALL\ \n\t SELECT 21, 030000, 39999, 'Chancellor' FROM Dual UNION ALL\ \n\t SELECT 22, 040000, 49999, 'Bishop' FROM Dual UNION ALL\ \n\t SELECT 23, 050000, 59999, 'Archbishop' FROM Dual UNION ALL\ \n\t SELECT 24, 060000, 69999, 'Cardinal' FROM Dual UNION ALL\ \n\t SELECT 25, 070000, 79999, 'Sage' FROM Dual UNION ALL\ \n\t SELECT 26, 080000, 89999, 'Saint' FROM Dual UNION ALL\ \n\t SELECT 27, 090000, 99999, 'Apostle' FROM Dual UNION ALL\ \n\t SELECT 28, 100000, NULL, 'Pope' FROM Dual\ \n\t),\ \n\tData(Ranking, Name, XP, Writeups)\nAS\n\t("; var table = document.getElementsByTagName('table')[4]; for(var row = 1; row < table.rows.length; row++) query += "\n\t SELECT " + table.rows[row].cells[0].textContent + ",\t'" + table.rows[row].cells[1].textContent + "',\t" + table.rows[row].cells[2].textContent + ",\t" + table.rows[row].cells[4].textContent.replace('None', 0) + "\t FROM + Dual UNION ALL"; // Chop off the extra UNION ALL query = query.slice(0, -10) + "\n\t),\ \n\tData_Level(Monk_Level, Title, Ranking, Name, XP, Writeups)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\tMonk_Level.Id,\ \n\t\tMonk_Level.Title,\ \n\t\tData.Ranking,\ \n\t\tData.Name,\ \n\t\tData.XP,\ \n\t\tData.Writeups\ \n\t FROM\ \n\t\tData,\ \n\t\tMonk_Level\ \n\t WHERE\ \n\t\tData.XP\tBETWEEN Monk_Level.XP_Low AND Monk_Level.XP_High\ \n\t OR\tData.XP >\tMonk_Level.XP_Low AND Monk_Level.XP_High IS NULL +\ \n\t),\ \n\tData_Level_Calc(Monk_Level, Title, Ranking, Name, XP, Writeups, XP +_Ratio, Exclude)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\tMonk_Level,\ \n\t\tTitle,\ \n\t\tRanking,\ \n\t\tName,\ \n\t\tXP,\ \n\t\tWriteups,\ \n\t\tXP / CASE Writeups WHEN 0 THEN 1 ELSE Writeups END,\ \n\t\t-- Monks with fabricated numbers or 0 Writeups mess up the calcu +lations.\ \n\t\tCASE WHEN Ranking > 1 AND Writeups > 0 THEN 1 END\ \n\t FROM\ \n\t\tData_Level\ \n\t),\ \n\tXP_Ratio_By_Level(Monk_Level, Title, Min, Max, Avg, Min_X, Max_X, +Avg_X)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\tMonk_Level,\ \n\t\tTitle,\ \n\t\tMIN(XP_Ratio),\ \n\t\tMAX(XP_Ratio),\ \n\t\tAVG(XP_Ratio),\ \n\t\t-- Exclude from functions by NULLing out the number.\ \n\t\tMIN(XP_Ratio * Exclude),\ \n\t\tMAX(XP_Ratio * Exclude),\ \n\t\tAVG(XP_Ratio * Exclude)\ \n\t FROM\ \n\t\tData_Level_Calc\ \n\t GROUP BY\ \n\t\tMonk_Level,\ \n\t\tTitle\ \n\t UNION ALL\ \n\t SELECT\ \n\t\tNULL,\ \n\t\t'(All)',\ \n\t\tMIN(XP_Ratio),\ \n\t\tMAX(XP_Ratio),\ \n\t\tAVG(XP_Ratio),\ \n\t\tMIN(XP_Ratio * Exclude),\ \n\t\tMAX(XP_Ratio * Exclude),\ \n\t\tAVG(XP_Ratio * Exclude)\ \n\t FROM\ \n\t\tData_Level_Calc\ \n\t),\ \n\tStep_XP_Writeups(Ranking, Name, XP, Writeups, XP_Diff, Writeups_Di +ff)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\tRanking,\ \n\t\tName,\ \n\t\tXP,\ \n\t\tWriteups,\ \n\t\tXP - LEAD(XP) OVER(ORDER BY Ranking),\ \n\t\tWriteups - LEAD(Writeups) OVER(ORDER BY Ranking)\ \n\t FROM\ \n\t\tData\ \n),\ \n\tClose(Ranking, XP_Diff, Writeups_Diff)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\tRanking,\ \n\t\tXP - LEAD(XP) OVER(ORDER BY Ranking),\ \n\t\tWriteups - LEAD(Writeups) OVER(ORDER BY Ranking)\ \n\t FROM\ \n\t\tStep_XP_Writeups\ \n\t WHERE\ \n\t\tXP_Diff\tBETWEEN -30 AND 30\ \n\t AND\tWriteups_Diff\tBETWEEN -5 AND 5\ \n),\ \n\tMMA(XP_Diff_Min, XP_Diff_Max, XP_Diff_Avg, Writeups_Diff_Min, Writ +eups_Diff_Max, Writeups_Diff_Avg)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\tMIN(XP_Diff),\ \n\t\tMAX(XP_Diff),\ \n\t\tAVG(XP_Diff),\ \n\t\tMIN(Writeups_Diff),\ \n\t\tMAX(Writeups_Diff),\ \n\t\tAVG(Writeups_Diff)\ \n\t FROM\ \n\t\tStep_XP_Writeups\ \n\t),\ \n\tStats(Both, Close, XP, XP_Diff_Min, XP_Diff_Max, XP_Diff_Avg, Writ +eups, Writeups_Diff_Min, Writeups_Diff_Max, Writeups_Diff_Avg)\ \nAS\ \n\t(\ \n\t SELECT\ \n\t\t(SELECT COUNT(COUNT(*)) FROM Data GROUP BY XP, Writeups HAVIN +G Count(*) > 1),\ \n\t\t(SELECT COUNT(*) FROM Close),\ \n\t\t(SELECT COUNT(COUNT(*)) FROM Data GROUP BY XP HAVING Count(*) > +1),\ \n\t\tXP_Diff_Min,\ \n\t\tXP_Diff_Max,\ \n\t\tXP_Diff_Avg,\ \n\t\t(SELECT COUNT(COUNT(*)) FROM Data GROUP BY Writeups HAVING Count +(*) > 1) Writeups,\ \n\t\tWriteups_Diff_Min,\ \n\t\tWriteups_Diff_Max,\ \n\t\tWriteups_Diff_Avg\ \n\t FROM\ \n\t\tMMA\n\t)\ \n--SELECT Monk_Level, Title, ROUND(Min, 2) Min, ROUND(Max, 2) Max, RO +UND(Avg, 2) Avg, ROUND(Min_X, 2) Min_X, ROUND(Max_X, 2) Max_X, ROUND( +Avg_X, 2) Avg_X FROM XP_Ratio_By_Level ORDER BY Monk_Level\ \n--SELECT Both, Close, XP, XP_Diff_Min, XP_Diff_Max, XP_Diff_Avg, Wri +teups, Writeups_Diff_Min, Writeups_Diff_Max, Writeups_Diff_Avg FROM S +tats\ \nSELECT\ \n\tData.Ranking,\ \n\tData.Name,\ \n\tData.XP,\ \n\tData.Writeups\ \nFROM\ \n\tData,\n\tClose\ \nWHERE\ \n\tData.Ranking BETWEEN Close.Ranking AND Close.Ranking + 1\ \nORDER BY\ \n\tData.Ranking;"; var textarea = document.createElement('textarea'); textarea.cols = 100; textarea.rows = 20; textarea.value = query; document.body.appendChild(textarea);

The current output of the (commented out) query is:

MONK_LEVEL TITLE MIN MAX AVG MIN_X + MAX_X AVG_X ---------- ---------- ---------- ---------- ---------- ---------- ---- +------ ---------- 13 Curate .07 3654 53 .07 + 727.8 35.26 14 Priest 4.66 4560 65.43 4.66 + 4560 65.43 15 Vicar 5.43 1604 47.12 5.43 + 1604 47.12 16 Parson 6.53 7619 129.17 6.53 + 371.8 31.9 17 Prior 6.16 814.62 46.28 6.16 +814.62 46.28 18 Monsignor 5 243.02 33.02 5 +243.02 33.02 19 Abbot 4.03 1103.75 73.21 4.03 1 +103.75 73.21 20 Canon 5.27 127.72 23.32 5.27 +127.72 23.32 21 Chancellor 6.45 55.35 14.3 6.45 + 55.35 14.3 22 Bishop 9.51 78.21 21.93 9.51 + 78.21 21.93 23 Archbishop 7.77 16.04 10.9 7.77 + 16.04 10.9 24 Cardinal 10.26 20.19 15.22 10.26 + 20.19 15.22 25 Sage 8.99 11.26 10.44 8.99 + 11.26 10.44 28 Pope 6.48 1659.73 421.35 6.48 + 12.06 8.56 (All) .07 7619 61.85 .07 + 4560 43.37

Notable, perhaps, is the average with exclusions is just over 43. Certainly more outliers can be excluded to "correct" that number. :)

The use of this data can be called into question, though it is data and data is fun, right?

FWIW, i occasionally update some useless data in Re: planetscape and me: 1282 nodes apiece. Though, i still haven't figured out why.

Replies are listed 'Best First'.
Re: Querying Saints in our Book (2)
by hippo (Bishop) on May 13, 2016 at 15:29 UTC

    The curate with the somewhat dubious XP per post ratio of 0.07 has to be NodeReaper who must surely qualify as an outlier to be discounted. Nice work, though.

      Thanx. Right now, Nodereaper has 3867 XP and 57005 Writeups, so it's definitely the culprit. Adding the exclusion (to my local script) moves the minimum up to 3.92, an honor held by some Moron, and nudges up the average for Curate and All groups as well:

      MONK_LEVEL TITLE MIN MAX AVG MIN_X + MAX_X AVG_X ---------- ---------- ---------- ---------- ---------- ---------- ---- +------ ---------- 13 Curate .07 3654 53 3.92 + 727.8 35.43 14 Priest 4.66 4560 65.43 4.66 + 4560 65.43 15 Vicar 5.43 1604 47.12 5.43 + 1604 47.12 16 Parson 6.53 7619 129.17 6.53 + 371.8 31.9 17 Prior 6.16 814.62 46.28 6.16 +814.62 46.28 18 Monsignor 5 243.02 33.02 5 +243.02 33.02 19 Abbot 4.03 1103.75 73.21 4.03 1 +103.75 73.21 20 Canon 5.27 127.72 23.32 5.27 +127.72 23.32 21 Chancellor 6.45 55.35 14.3 6.45 + 55.35 14.3 22 Bishop 9.51 78.21 21.93 9.51 + 78.21 21.93 23 Archbishop 7.77 16.04 10.9 7.77 + 16.04 10.9 24 Cardinal 10.26 20.19 15.22 10.26 + 20.19 15.22 25 Sage 8.99 11.26 10.44 8.99 + 11.26 10.44 28 Pope 6.48 1659.73 421.35 6.48 + 12.06 8.56 (All) .07 7619 61.85 3.92 + 4560 43.44

      Good catch! I plan to add that after the weekend in a script update to oujs.

      NodeReaper added to version 1.1.2. Had to change ranking to username, as ranking is not userid. But i knew that already...

      I'm hoping for more suggestions about whom to exclude or modify, as inactive users who get the overwhelming majority of their points via voting are not really representative of the ratio. OTOH, except for the egregious, the outliers probably do not mess up the ratio much anyway.

      I'm curious also if there is interest is displaying each monk's ratio and its relation to the level and all groups' ratios. The possibilities for useless information are endless.

        suggestions about whom to exclude or modify, as inactive users who get the overwhelming majority of their points via voting are not really representative of the ratio

        I added fields in "Ranking the Saints by XP Efficiency" to require having visited the site in the most recent N weeks, and having made a minimum number of posts. Did not add a filter to require having posted in the last N weeks, which would have improved the quality of the rankings even more, but I think it could be done. Also my script can be configured to deduct 0.5 XP per day in an attempt to get closer to the XP gained from posting rather than longevity.


        The way forward always starts with a minimal test.
Re: Querying Saints in our Book (2)
by LanX (Saint) on May 17, 2016 at 13:31 UTC

      He was excluded from the beginning, as his numbers really mess up the average.

        ahhh ... *_X are the corrected numbers .... got it.

        You might be interested in Median and other techniques to eliminate outliers.

        You could even cluster different user types ( "voters" vs "posters" )

        Personally I'd like to see a XP per response level stat ( Re^n: ), but accessing the DB is outside of your possibilities.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://1162965]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-04-23 16:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found