comment on

Hello Monks,

I have an accelerometer with onboard firmware that generates CSV files with records at 80Hz but only when there is activity. Empirical analysis suggests the mean rate is around 6.0Hz. The CSV has records like this:

17077.395763,-739,1059,-16734

If you want to see a sample data file, you can download a <5 hour sample here: accel.csv (3.0MB CSV).

The columns are seconds (since recording began), and x,y,z accelerometer values in 16384*g signed 16-bit precision.

What I've Tried

Now, the problem is these files are a rather inefficient use of space. To the extent that disk is (not) cheap in my application, I need to reduce their size.

As a first cut, I post-process the files with pack('fsss',...), which gives about 3:1 reduction. Note that the last two digits of the timestamp are spurious, so it can be converted to a single-precision float.

Further compressing the output with xz or bzip2 brings that up to about 5.5:1 (compression alone without pack() was about 4:1).

Finally I started conditionally storing the time as 8-bit delta 1/10000 second values if the delta from the previous record is sufficiently small (at 80Hz, it is), otherwise I store a (magic) 0 followed by unsigned 32-bit 1/10000 seconds. Hence, the record is either Csss (98.5% of records) or CLsss (1.5% of records). This brought the ratio up to 6.5:1 after compression with xz, at the cost of a little more complexity.

The Question

Can I improve significantly on 6.5:1 before I move on to lossy methods, such as reducing the frequency and resolution?

In reply to Specialized data compression by wanna_code_perl

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.