in reply to Dumping compact YAML

You have posted two questions today that seem concerned with "heaviness" - this node and Counting line of code. Unless you are working with a large number of records (100K and up) and have very, very tight memory, disk-space, bandwidth, or interprocess communication constraints (e.g. realtime applications), saving a few whitespace characters is not going to make much difference to the speed or efficiency of your program. In the example above you saved a grand total of 10 characters or (N*4)-2 where N is the number of items in your list.

Perhaps you could share with us the reason this is an issue for you? It might help us do a better of job of suggesting solutions.

Even if you have a very good reason for being concerned about space or format, I caution you about using JSON in place of YAML. Despite claims that JSON is a subset of YAML there are some important and potentially significant differences between the two. Please see Re: Caching or using values across multiple programs for details.

Best, beth

Replies are listed 'Best First'.
Re^2: Dumping compact YAML
by dgaramond2 (Monk) on Jul 21, 2009 at 14:04 UTC

    Thanks for the responses.

    For my application, I am storing database row diffs in a TEXT column. For each change (UPDATE, INSERT, DELETE) a diff will be stored to allow undo/redo to previous versions. It is expected that many of the times the change will be an UPDATE of only one or two columns (out of several/many). I will be storing the changes as hash (or hashref technically in Perl) of column names and values.

    YAML or JSON interests me because of the balance between readability and compactness. Of course, shaving off whitespaces will only save a few bytes in this case and in my particular example. But compacting a deep data structure obviously saves a bit more. For example, compare: [1, [2, [3, [4, [5, [6, [7, [8, [9, 10]]]]]]]]] compact representation in YAML (49 bytes) and it's non-compact one (230). It's a 4.5x compression ratio.

      Well if you want to bring compression into the picture you're back down to around 14-20 bytes which again is most likely chump change in the big picture. :)

      $ l {foo,bar}.yml* -rw-r--r-- 1 fletch fletch 275 Jul 21 11:28 bar.yml -rw-r--r-- 1 fletch fletch 91 Jul 21 11:28 bar.yml.bz2 -rw-r--r-- 1 fletch fletch 80 Jul 21 11:29 bar.yml.gz -rw-r--r-- 1 fletch fletch 110 Jul 21 11:29 foo.yml -rw-r--r-- 1 fletch fletch 77 Jul 21 11:29 foo.yml.bz2 -rw-r--r-- 1 fletch fletch 61 Jul 21 11:29 foo.yml.gz $ for i in {foo,bar}.yml ; { print $i ; cat $i } foo.yml --- a: [1, [2, [3, [4, [5, [6, [7, [8, [9, [10]]]]]]]]]] b: [1, [2, [3, [4, [5, [6, [7, [8, [9, [10]]]]]]]]]] bar.yml --- a: - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 10 b: - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 10

      Of course if you're compressing before tossing blobs into your DB you've lost at least immediate readability (but then on the other other hand that's just a short helper utility from being back hyoomon readable nicely indented).

      As another suggestion, if you've got a (relatively) small class of input data you might just roll your own serialize routine which spits out a more compact YAML representation rather than using one of the off-the-shelf modules.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.