I've done a number of file formats as well, and there are two pieces of advice I'd like to add to your excellent list:
- Explicitly specify your escape methodology: if you are creating a CSV file, how will a comma in the data be escaped?
- If possible, use record and unit separators that are unlikely to exist in your data: for example, I like to use the ASCII chars \x1E\x0A ("Record Separator"+ newline) and \x1F ("Unit Separator") to separate records and elements, respectively. These are unlikely to appear in text data (unlike columns, tabs, etc.) and reduce the complexity of the escaping strategy that will be required.
In many cases, combining these can result in "the record-separator and element-separator chars are not allowed in text data" as an escaping strategy. This means you can use code like:
open my $F_data, '<', 'filename.dat' or die("bad open: $!");
local $\ = "\x1E\x0A";
while (<$F_data>) {
my @row = split("\x1F", $_);
process (\@row);
}
Instead of relying on (admittedly excellent) modules like Text::CSV_XS. Using these chars tremendously simplifies one's life!
<-radiant.matrix->
Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
The Code that can be seen is not the true Code
"In any sufficiently large group of people, most are idiots" - Kaa's Law
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|