I like but will give an example of a CSV format I'm using that is pathological and will give the wrong separator, but it's a special case and the number of "customers" for it is very restricted.

I have a data acquisition system made by someone else that reads a few hundred parameters every 20 seconds or so. It writes them to disk locally and spits them out via UDP to a closed network (because of some firewall requirements).

The data format for each line is:

val1,val2, val3,...valN,;,header1, header2, header3,...,headerN <CRLF>

but the first line of any given file is:

header1, header2, header3,...,headerN,;,header1, header2, header3,...,headerN <CRLF>

where the semicolon is in there to let the reader know that it's at the end of the value list, and the rest of the line is header information.

The reason for the first line is that the reader of the file is likely going to load it into Excel and putting a header at the top reduces the risk of column alignment errors from copying the header information to the top of the Excel file. The reasoning behind the line format is that the listener (which is written in Perl) on the UDP port, which makes a backup copy of the data and serves it up to both a web page and a labview program that displays data plots, knows nothing about the details of the sender, and if the sender software is updated to add data columns, the listener has to notice and still properly display and label them. So if the listener sees a change in the header list it will start a new file with the correct header at the top of it

Your autodetect looks like it would catch the semicolon and quit looking for separators. I admit that this is a pathological case and you're unlikely to see it in the wild, but oddball cases with more than one potential separator in the header line do exist. You might want to have an option to throw an error if more than one potential separator appears in the first line, and/or maybe an option to count the number of each type of potential separator and pick the one with the most instances. I think I'd rather see it default to an error message if there are multiple potential separators in the header, and have a flag that lets me tell it to suppress the error and guess.


In reply to Re: CSV headers. Feedback wanted by bitingduck
in thread CSV headers. Feedback wanted by Tux

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.