comment on

Nothing you have said so far explains the need for YAML/Data:Dump etc serialisation to me. You data is already human readable. Storing the original and processed data in the same file is a simple as adding a separator. You don't need any fancy modules to do it.

I think what you are missing is that the processed data is not text, it's a perl data structure, (It's a whole bunch of objects that subclass Tree::DAG_Node actually) so it does need some sort of serialization to be stored.

You are not actually using it to reconstitute a data structure, nor is there any real need as all you want to do is reformat the old data into the new format so you can process it.

No, I am using it precisely to reconstitute a data structure. What I'm doing is not converting old data into a new format for processing, what I'm doing is ensuring that updates and modifications to the processor for newer formats don't cause it to break for old formats (because I still need it to work on those formats as well).

Basically what's going on is that the data is a tar.gz archive of a lot of information collected from a linux host (configuration files, command output, /proc contents, similar to what you would get from Red Hat's sysreport tool, or from VMware's vm-support). What format the data is in depends on what version of the operating system was on the host it was collected from. As time goes on, newer versions of linux and newer versions of our software that runs on the host mean that the data in the files may be different on newer versions than it was on older versions. So what I'm doing is collecting this information to make sure that changes that are made to support newer versions haven't introduced incompatibilities that will make it fail on older versions.

So basically, after building a parser for one of these files and confirming that it produces the right output, I run a script that stores both the original text, and the serialized output into a file.

This file is then used later by the test suite, which loads up the original data, and the originally serialized output. Then it runs the original data through the current version of the parser for that file and confirms that given the same original input, the current version of the parser produces the same output as the original.

www.jasonkohles.com
We're not surrounded, we're in a target-rich environment!

In reply to Re^4: Human-readable serialization formats other than YAML? by jasonk
in thread Human-readable serialization formats other than YAML? by jasonk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.