The rules are the standard Excel-style CSV rules, any embedded '"' characters get doubled up, and any value that contains a ',' character must have '"' characters delimiting the value, but other values don't need to have delimiters.
The code is a simple state machine processing one character at a time and storing two state variables based on whether an opening " has been detected and when a " has been encountered within a quoted value.
Shortcomings:
Doesn't handle newlines in quoted values
use strict; use warnings; # Note: doesn't handle newlines in quoted values my $out = $ARGV[0].".tab"; open OUT,">$out" or die "Can't open output $out\n"; while (<>) { my $tab = ""; my $qv=0; # Quoted value indicator my $dq=0; # Double quote flag indicates the previous character was +a " for (split //) { # Start of a quoted value if (not $qv and $_ eq '"') { $qv=1; next; } # Double quotes within or at the end of a quoted value if ($qv and $_ eq '"') { $dq=1; next; } # If last char was a double quotes OR we're not within a quoted +value, comma = tab if (($dq or not $qv) and $_ eq ',' ) { $dq=0; $qv=0; $_="\t"; } +# End of field # Two consecutive double-quote characters within a quoted value elsif ($dq and $_ eq '"') { $dq=0; } # Double double quotes $tab .= $_; } print OUT $tab; }
In reply to Converting CSV to tab-delimited by PhilHibbs
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |