comment on

Hi, I'm attempting to write a script to parse csv files, and print them to a file. Unfortunately XS modules (including Text::CSV) are not an option at this point.

The CSV files have the following format:

$quoteChar $field $quoteChar $separationChar (no spaces)
So for example:
"Perlmonks", "http://www.perlmonks.org", "excellent ;)"
[download]

Entries are delimited by newlines.

So far, I have the following code:

#!/usr/bin/perl -w

use strict;

my $debug = 1;

my $read_file  = 'in.csv';
my $write_file = 'out.csv';

my $arrayref = parseCSV($read_file);

for my $line (@{ $arrayref }) {

    for my $field (@{ $line }) {
    
        print "Field: $field\n";

    }   

}

printCSV($write_file, $arrayref);


# parse a csv file into an array of arrays
sub parseCSV {

    my $file_path = shift;
    
    my $separationChar = ',';
    my $quoteChar      = '"';
    my $escapeChar     = '\\';
    
       
    my $inField = 1;


    my @data;
    # read csv file
    open DATA, $file_path or die("Couldn't read data file: $!");
    while (<DATA>) {
        # remove newline
        chomp;
        # split into single chars
        my @chars = split('', $_);
        # store previous letter (for escape codes)
        my $previous = '';
        my @fields;
        for my $c (@chars) {
            my $dataString;
            if (($c eq $quoteChar) && ($previous ne $escapeChar)) {
            
                if ($inField) {
                    $inField = 0;
                    next;
                } else {
                    $inField = 1;
                    next;
                }
            }
            
                    
            if ($inField) {
            
                # ignore all in-field escape chars
                if ($c eq $escapeChar) {
                    next;
                }
                
                # append char to data string
                $dataString = $dataString . $c
            }
            
            if ((! $inField) and ($c eq $separationChar)) {
                push(@fields, $dataString);
            }                    
            
        }
        push(@data, \@fields);
                
    }        
    
    
    close DATA;    
    
    
    
    # return a reference to an AoA
    return \@data;
}


# format and print an AoA to a CSV file
sub printCSV {

    my $file_path = shift;
    my $entries   = shift; # AoA ref containing entries
    
    my $separationChar = ',';
    my $quoteChar      = '"';
    my $escapeChar     = '\\';    
    
    my @data;
      
    for my $entry (@{$entries}) {
    
        my $entryString = '';
        for my $field (@{ $entry }) {
        
            # escape all existing $quoteChars
            my $escapeQuote = $escapeChar . $quoteChar;
            $field = $field =~ s/$quoteChar/$escapeQuote/;
        
            # enclose in quoteChars
            $field = $quoteChar . $field . $quoteChar;
            debug("Field: $field");
            
            # add on to $entryString
            $entryString = $entryString . $separationChar . $field;
            debug("Entry String: $entryString");            

        }
        # add a newline on the end
        $entryString = $entryString . "\n";
        push(@data, $entryString);
    }
    
    # write @data to the file
    open DATA, ">$file_path" or die("Couldn't open $file_path: $!");
    print DATA @data;    
    close DATA;

    return;

}


sub debug {
    # write to log file instead of <STDOUT>

    my $message = shift;
    
    if ($debug) {
        print $message, "\n";
    }
    
}
[download]

The two main errors I'm getting right now are:

Use of uninitialized value in concatenation (.) or string
at parseTest.pl line 16.

and 

Use of uninitialized value in substitution (s///)
at parseTest.pl line 113.
[download]

The out.csv files contains junk:


,"","","",""
,""
,""
[download]

Any insights on how to improve the code would be greatly appreciated :)

In reply to Writing a CSV Parser/Printer by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.