in reply to Re: Dumping compact YAML
in thread Dumping compact YAML

Thanks for the responses.

For my application, I am storing database row diffs in a TEXT column. For each change (UPDATE, INSERT, DELETE) a diff will be stored to allow undo/redo to previous versions. It is expected that many of the times the change will be an UPDATE of only one or two columns (out of several/many). I will be storing the changes as hash (or hashref technically in Perl) of column names and values.

YAML or JSON interests me because of the balance between readability and compactness. Of course, shaving off whitespaces will only save a few bytes in this case and in my particular example. But compacting a deep data structure obviously saves a bit more. For example, compare: [1, [2, [3, [4, [5, [6, [7, [8, [9, 10]]]]]]]]] compact representation in YAML (49 bytes) and it's non-compact one (230). It's a 4.5x compression ratio.

Replies are listed 'Best First'.
Re^3: Dumping compact YAML
by Fletch (Bishop) on Jul 21, 2009 at 15:50 UTC

    Well if you want to bring compression into the picture you're back down to around 14-20 bytes which again is most likely chump change in the big picture. :)

    $ l {foo,bar}.yml* -rw-r--r-- 1 fletch fletch 275 Jul 21 11:28 bar.yml -rw-r--r-- 1 fletch fletch 91 Jul 21 11:28 bar.yml.bz2 -rw-r--r-- 1 fletch fletch 80 Jul 21 11:29 bar.yml.gz -rw-r--r-- 1 fletch fletch 110 Jul 21 11:29 foo.yml -rw-r--r-- 1 fletch fletch 77 Jul 21 11:29 foo.yml.bz2 -rw-r--r-- 1 fletch fletch 61 Jul 21 11:29 foo.yml.gz $ for i in {foo,bar}.yml ; { print $i ; cat $i } foo.yml --- a: [1, [2, [3, [4, [5, [6, [7, [8, [9, [10]]]]]]]]]] b: [1, [2, [3, [4, [5, [6, [7, [8, [9, [10]]]]]]]]]] bar.yml --- a: - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 10 b: - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 10

    Of course if you're compressing before tossing blobs into your DB you've lost at least immediate readability (but then on the other other hand that's just a short helper utility from being back hyoomon readable nicely indented).

    As another suggestion, if you've got a (relatively) small class of input data you might just roll your own serialize routine which spits out a more compact YAML representation rather than using one of the off-the-shelf modules.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.