A colleague is having problems loading CSV (comma-separated variable length) data into Microsoft SSIS, so I told him I'd write a script to help him. I saw Text::CSV, but I don't want to have to go through module installation with him as he's a bit scared of Perl already. So I wrote this little script - and I didn't want to have to teach him about < and > on the command line, so the script automatically generates a file name based on the input with ".tab" on the end.

The rules are the standard Excel-style CSV rules, any embedded '"' characters get doubled up, and any value that contains a ',' character must have '"' characters delimiting the value, but other values don't need to have delimiters.

The code is a simple state machine processing one character at a time and storing two state variables based on whether an opening " has been detected and when a " has been encountered within a quoted value.

Shortcomings:

Doesn't handle newlines in quoted values

use strict; use warnings; # Note: doesn't handle newlines in quoted values my $out = $ARGV[0].".tab"; open OUT,">$out" or die "Can't open output $out\n"; while (<>) { my $tab = ""; my $qv=0; # Quoted value indicator my $dq=0; # Double quote flag indicates the previous character was +a " for (split //) { # Start of a quoted value if (not $qv and $_ eq '"') { $qv=1; next; } # Double quotes within or at the end of a quoted value if ($qv and $_ eq '"') { $dq=1; next; } # If last char was a double quotes OR we're not within a quoted +value, comma = tab if (($dq or not $qv) and $_ eq ',' ) { $dq=0; $qv=0; $_="\t"; } +# End of field # Two consecutive double-quote characters within a quoted value elsif ($dq and $_ eq '"') { $dq=0; } # Double double quotes $tab .= $_; } print OUT $tab; }

In reply to Converting CSV to tab-delimited by PhilHibbs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.