comment on

OK, I'll answer my own question here. The solution I settled on was to use YAML to specify a variety of simple file formats. Perl ingests these and transforms the various file formats into new file formats. The reason to have different types of input files is for things like separating the data model (performance data) from the view (graph), and being able to reuse config information (e.g. metric definitions). At the final stage, where we need to take data series and marry this to HTML and javascript, we use the venerable Template Toolkit. The result is a data file which is durable, computer readable, visually pleasing, and self-contained, as well as HTML/javascript graphs (or pick some other technology/library), and some reusable metedata files (e.g. metric defs, graph config).

I'm amazed at how easy this is with YAML/Perl/TT.

Step one is to take the logfiles which are emitted by the load client as single lines of name=value pairs. This is an extremely useful format for these types of programs which tend to be C or C++ since it doesn't require much to output this format. Emitting things as YAML at this stage is, in my opinion, overkill. The "single-line, name-value pair metric" format is something that is a pretty easy pill to swallow, even for very high performance programs where we don't want to introduce too much of a burden.

The next step is to produce what I call a "yaml-gram". A yaml-gram is a "self-contained, visually pleasing, data-oriented, instance of a typed YAML document". In my case, this was performance data from load tests on a messaging system. Thus the type of this document was PTD (performance test data) which has a simple schema. In order to make the data "self-contained" I found it useful to have not only the metric data, but also the _meanings_ of the metrics as well as their units, and the test setups. This is necessary so that the data can be properly reused. If you don't know the meaning of the metric, or there is some ambiguity, then it becomes junk data (I found that providing the unit of measure, and a description, keyed to the metric name, were sufficient). In addition the test setups should be specified such that they can be used as input to drive the tests to be executed again. Note that the description of the metrics stays the same for large numbers of tests, and therefore becomes it's own YAML file, which gets married to the data file to produce the final PTD yaml file. In addition there's a tt template for any boilerplate for this particular test series (e.g. software versions used). Ingesting these various file types (logfile, metric meta file, tt file) and operating on them is so simple with YAML, it's fun.

One thing I found challenging was Dumping the yaml file. Sure, Dump () "works", but it doesn't seem to render things in a very pleasing format. Since yaml is designed to be human readable, I found it necessary to concentrate on the human readable aspects of the file to make it visually pleasing (there is probably a perl lib that does this). I wound up writing a little formatter which will take the metrics (a hash of single dim arrays of numbers) and block format them so that they have consistent padding/spacing. This makes it look really readable. At this stage we have a computer readable, self-contained (much of the data needed to understand and use the metrics is in the same document), visually pleasing, yaml-gram (ptd). All the pieces of it are very useful, and we likely won't have to change this format, since data models are usually fairly straightforward. I did find that I organized the data _much differently_ than I would had I used XML. This was because I concentrated on human readability (thus the data, description and units are their own sections, rather than colocating each under a single metric).

Next we take a YAML input file which provides information needed to make an abstract scatter graph. We indicate a "common" series (for the X, which will be shared). This is typically the step value in the tests. We can also indicate several "Y" series (which will share the X coordinates). The metrics are given as 1 dim arrays, so you combine the common to get an array of coordinate pairs. We add other details like labels in at this point. So the next stage requires reading our ptd file and this file selecting the pieces, and then putting it together into a single hash to pass to Template Toolkit. TT then takes this and renders an HTML file, with javascript graph display code. I used a package called Flot, which is pure javascript and claims to be cross browser. I just took one of the example files, and did "cut and paste" programming with TT. That is, I found the place where I needed an array of numbers and replaced it with my reference to my variables. This part was extremely quick and I didn't really have to learn too much about the graph package, which was my intent.

The result was extremely pleasing, auto-generated, javascript interactive graphs. The perl command-line transformation tools which do each step may of course be chained together to form a pipeline (another thing that's wonderfully simple using perl and Getopts::Long). Further, by focusing on the file formats and data transformation, the problem is solved in a general way that doesn't tie me to a particular rendering solution. I'm very impressed with the Perl/YAML/TT combination.

In reply to Re: Performance Data and Graphing Metadata File Formats and Transformations by zerohero
in thread Performance Data and Graphing Metadata File Formats and Transformations by zerohero

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.