I've done a number of file formats as well, and there are two pieces of advice I'd like to add to your excellent list:

  1. Explicitly specify your escape methodology: if you are creating a CSV file, how will a comma in the data be escaped?
  2. If possible, use record and unit separators that are unlikely to exist in your data: for example, I like to use the ASCII chars \x1E\x0A ("Record Separator"+ newline) and \x1F ("Unit Separator") to separate records and elements, respectively. These are unlikely to appear in text data (unlike columns, tabs, etc.) and reduce the complexity of the escaping strategy that will be required.

In many cases, combining these can result in "the record-separator and element-separator chars are not allowed in text data" as an escaping strategy. This means you can use code like:

open my $F_data, '<', 'filename.dat' or die("bad open: $!"); local $\ = "\x1E\x0A"; while (<$F_data>) { my @row = split("\x1F", $_); process (\@row); }
Instead of relying on (admittedly excellent) modules like Text::CSV_XS. Using these chars tremendously simplifies one's life!

<-radiant.matrix->
Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
The Code that can be seen is not the true Code
"In any sufficiently large group of people, most are idiots" - Kaa's Law

In reply to Re: Thoughts on designing a file format. by radiantmatrix
in thread Thoughts on designing a file format. by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.