[OT] Data visualisation

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: [OT] Data visualisation by RichardK (Parson) on Feb 07, 2015 at 17:02 UTC
That's a lot of variables to try and visualise in one go. What about creating a synthetic metric of some sort that combines the 3 that you've got ? (Mem * Insert * find) is possibly the simplest. But that will depend on what you're trying to measure. Then you can draw a 3d plot for each value of P1 showing p2:p3:metric. That's a lot of plots, but it might tell you something (or not!) I've used synthetic metrics and found them to be quite useful, but you do need to educate your users that the absolute values are meaningless and only the relative difference has any sort significant.	[reply]
Re^2: [OT] Data visualisation by hdb (Monsignor) on Feb 07, 2015 at 17:59 UTC
I support your proposal, I would go for a weighted sum of the target variables, like dividing the first by 100000, and then find the best combination. Playing with the weights for each column can help. Or run some multivariate linear statistics on each of metrics to see how the relationship looks like. If you get high R^2s then life will be easy. There might even be a module to do that...	[reply]
Re^2: [OT] Data visualisation by LanX (Saint) on Feb 08, 2015 at 06:29 UTC
> What about creating a synthetic metric of some sort that combines the 3 that you've got ? (Mem Insert * find)* Actually this product is just the volume of the cuboid with these metrics as edge length. Hence cuboids placed in the center of cells in a 3d grid can have the combined metric and the components available in the same visual representation. To facilitate the display, I'd normalize the metrics to approximate a cube in average and avoid extreme edge values which extend the cell boundaries. Additionally I'd use a color range for the volume of the cuboids. Cheers Rolf PS: Je suis Charlie!	[reply]
Re: [OT] Data visualisation by graff (Chancellor) on Feb 07, 2015 at 18:19 UTC
Maybe you could use a canvas where X-offset, Y-offset and shape represent the three parameters, while color, width and height of filled shapes represent metric values. Since the first parameter seems to extend over a large range while the other two have relatively few distinct values, that first one should be the X axis, one of the others should take up ranges along the Y-axis, within which different shapes can be drawn for each value of the last parameter; the first metric looks like a good fit to use color. It might take some practice to figure out how to interpret the image, but there's a reasonable chance that if patterns are present in the data, they'll be visible (when you focus on the right cues). (Update: that said, I think the data-reduction suggestions in the replies above are probably going to lead to an easier/quicker assessment overall.)	[reply]
Re^2: [OT] Data visualisation by atcroft (Abbot) on Feb 07, 2015 at 19:23 UTC
Similarly, you could go with a bubble chart, where the X and Y axis represent two of the parameters, and the third is represented by the size of a disk at that (X,Y). Another option might be a heat map, where the third parameter is represented by the color or brightness of the point at (X, Y). Hope that helps. (And after writing the above, I realize that all I did was re-organize what graff said. Reviewing some of the links (and their links), might lead you to some additional options.)	[reply]
Re: [OT] Data visualisation by davies (Monsignor) on Feb 07, 2015 at 20:47 UTC
Short version: I'd go for a stacked bar graph. Longer version: The data are complex and any graphic representation must, without lots of specialised knowledge, be a first approximation that is "scrutable and refutable" ("The creative computer", Michie & Johnson). I would therefore start by normalising all three metrics to standard deviations. I'd design my Excel spreadsheet (stop throwing things at me) so that all three were weightable, and the weight I would want to use is money, but that's probably better computed by the beancounters than the programmers. I'd start with all weightings at 1. I would want to eliminate, before graphing, as many tunings as possible. It looks to me as though you have 10x6x6=360 different possible settings, which I would find too large for a single graph. I'd start by eliminating everything that was above average on all three metrics. Depending on how many that left, I'd proceed by eliminating those above average by x on all three or reinstating those above average on only 2, and so on until getting down to single figures. Of course, the weightings might result in the later reinstatement of originally discarded tunings. On the 42 points you have given, 18 are better (assuming lower is better) than the mean on all three metrics. I'd be tempted to reverse the sign of the std devs so that high is good, the way the human eye (or at least mine) works naturally. Unweighted, all the selected 18 settings have P1 at 512K Without any weightings, I've put up a quick & dirty excel file at https://gitorious.org/metrics/metrics/source/f882fd2061bb6b31ce71282ae035213cfe8bf254:. I haven't labelled the data points as I'd like because that involves VBA & doesn't play nicely with open source "Excel compatible" (hah!) spreadsheets. Regards, John Davies	[reply]
Re: [OT] Data visualisation (3D+color) by tye (Sage) on Feb 07, 2015 at 21:17 UTC
Memory: X, Insert: Y, Find: Z, P1: Red, P2: Blue, P3: Green. Allow for rotating your view. - tye	[reply]
Re: [OT] Data visualisation (Reply to all.) by BrowserUk (Patriarch) on Feb 09, 2015 at 05:33 UTC
Thanks for all the inspirations guys. What I've settled on is a pseudo-3D field, with tridents (3-point stars or crosses) with the length of each point representing one of the outputs: thus. The inputs (axes) were obvious candidates for log₂ scaling. I tried various methods of reducing the range of the values; and settled upon the simple 'difference from minimum' as giving the best visual representation of the relative values, whilst minimising clutter. By aligning the three star points with the x, y & z axes, it made it easy to visually distinguish aligned points in 3D, without the need to add extended ticks or grid lines, which tend to clutter the graph. I did consider adding a line or curve connecting the end points of the tridents in each plane; but I couldn't wrap my head around how to code that. I tried tye's R=mem, G=Insert, B=lookup color blob: but it is impossible to order a muddy brown blob, a muddy purple blob and a muddy orange blob visually. But using 3 primary colours for the 3 output values was a nice idea. I tried LanX's cuboid: but the area of each face becomes the product of two different measures; so the visual cues as to relative size get muddled. But plotting the three outputs in different planes was the basis of the tridents. I tried Davies stacked bar charts; but on a 2-D plot, whichever way I oriented them, smaller ones ended up hidden behind bigger ones no matter how I tilted or rotated the view. And grouping the 3rd dimension made for skinny tall bars that were difficult to read. I also tried floating the stacked bars in a 3D field: but still, however I rotated the view, some bars obscured other unless I made them so small they were unreadable. Also, comparing the relative values of bars (other than the datum aligned (bottom) one) becomes very difficult. But keeping the three outputs distinct by colour; but putting them together to give an combined size cue, works. Once again, thanks to all who responded. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]