Hi, I'm attempting to write a script to parse csv files, and print them to a file. Unfortunately XS modules (including Text::CSV) are not an option at this point.

The CSV files have the following format:

$quoteChar $field $quoteChar $separationChar (no spaces) So for example: "Perlmonks", "http://www.perlmonks.org", "excellent ;)"

Entries are delimited by newlines.

So far, I have the following code:

#!/usr/bin/perl -w use strict; my $debug = 1; my $read_file = 'in.csv'; my $write_file = 'out.csv'; my $arrayref = parseCSV($read_file); for my $line (@{ $arrayref }) { for my $field (@{ $line }) { print "Field: $field\n"; } } printCSV($write_file, $arrayref); # parse a csv file into an array of arrays sub parseCSV { my $file_path = shift; my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my $inField = 1; my @data; # read csv file open DATA, $file_path or die("Couldn't read data file: $!"); while (<DATA>) { # remove newline chomp; # split into single chars my @chars = split('', $_); # store previous letter (for escape codes) my $previous = ''; my @fields; for my $c (@chars) { my $dataString; if (($c eq $quoteChar) && ($previous ne $escapeChar)) { if ($inField) { $inField = 0; next; } else { $inField = 1; next; } } if ($inField) { # ignore all in-field escape chars if ($c eq $escapeChar) { next; } # append char to data string $dataString = $dataString . $c } if ((! $inField) and ($c eq $separationChar)) { push(@fields, $dataString); } } push(@data, \@fields); } close DATA; # return a reference to an AoA return \@data; } # format and print an AoA to a CSV file sub printCSV { my $file_path = shift; my $entries = shift; # AoA ref containing entries my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my @data; for my $entry (@{$entries}) { my $entryString = ''; for my $field (@{ $entry }) { # escape all existing $quoteChars my $escapeQuote = $escapeChar . $quoteChar; $field = $field =~ s/$quoteChar/$escapeQuote/; # enclose in quoteChars $field = $quoteChar . $field . $quoteChar; debug("Field: $field"); # add on to $entryString $entryString = $entryString . $separationChar . $field; debug("Entry String: $entryString"); } # add a newline on the end $entryString = $entryString . "\n"; push(@data, $entryString); } # write @data to the file open DATA, ">$file_path" or die("Couldn't open $file_path: $!"); print DATA @data; close DATA; return; } sub debug { # write to log file instead of <STDOUT> my $message = shift; if ($debug) { print $message, "\n"; } }

The two main errors I'm getting right now are:

Use of uninitialized value in concatenation (.) or string at parseTest.pl line 16. and Use of uninitialized value in substitution (s///) at parseTest.pl line 113.

The out.csv files contains junk:

,"","","","" ,"" ,""

Any insights on how to improve the code would be greatly appreciated :)


In reply to Writing a CSV Parser/Printer by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.