http://qs1969.pair.com?node_id=51001

Item Description: Manipulation routines for comma-separated values

Review Synopsis:

Author: Alan Citterman

I had a project where I needed to extract data from a file and send
it to a customer. The file in question was from a database, and it
had been exported to a CSV text file.

I would have tried to write my own regular expression to handle this,
but my overall knowledge of Perl isn't that good. However, after some
research, I found a reference to this module.

#!/usr/bin/perl use strict; use Text::CSV;
I knew that the text file had lines of data that I didn't need, and
that there was an easily recognizable pattern in those lines, so I could
use a regular expression to put those lines into a trash file.

my $input="input.csv"; my $output="output.txt"; my $trash="trashfile"; my $csv=Text::CSV->new(); #Creates a new Text::CSV object open(INFILE,$input) || die "Can't open file $input"; open(OUTFILE,">$output") || die "Can't open file $output"; open(TRASH,">$trash") || die "Can't open file $trash";

Now to start reading the data from the file, store it in the $_ variable
and print it to the trash file if its not good, or parse the variable, and
print it to the output file if it is.

while (<INFILE>) { if (/"X"/) { #The trash data has these 3 characters in it print TRASH "$_\n"; } else { #Now to deal with the data I want to keep if($csv->parse($_)) { #checks to see if data exists in $_ and +parses it if it does my @fields=$csv->fields; # puts the values from each field in an +array my $elements=@fields; #gets the number of elements in the arra +y for ($x=0;$x<$elements;$x++) { print OUTFILE "$fields[$x]\t"; } } } }
Now that the files have been written to, I can close them up, and remove
the trash file

close INFILE; close OUTFILE; close TRASH; unlink $trash;
All in all, a very useful module.

Replies are listed 'Best First'.
Re: Text::CSV
by swiftone (Curate) on Jan 11, 2001 at 02:34 UTC
    I to use Text::CSV quite happily. I spent forever and a day (actually, about 6 hours) trying to debug a problem with it, that was finally answered deep within the documentation. When it says:

    Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde).

    It means it. In particular, if you get anything outside of this range, including any MS curly-quotes or (in my case) a single oddly out-of-range byte, it will quietly fail.

    Note that it does not accept newlines inside any field either.

      If you use Text::CSV_XS, you can handle those characters if you turn on the 'binary' option, like this:
      use Text::CSV_XS; $csv = Text::CSV->new({binary => 1}); ... $csv->parse($_); @fields = $csv->fields();
      Really nifty if you have to parse a CSV file with French text with accented characters and newlines in it...
        Just to be nit-picky, I think that is:
        $csv = Text::CSV_XS->new({binary => 1});
        Those that pointed out that this would resolve working with text that has accents in it - I love you. I was going absolutely batty trying to deal with this text that I needed to convert to XML.


        -------------------------------------------------------------------
        There are some odd things afoot now, in the Villa Straylight.
Re: Text::CSV
by dkubb (Deacon) on Jan 18, 2001 at 06:33 UTC

    This is an excellent module, but I prefer to use Text::CSV_XS.

    It's an XS based module, and is quite fast, definately the fastest perl module for manipulating quote-comma files on CPAN, that I have heard of.

    It has an identical interface to Text::CSV, meaning all would have to do a s/Text::CSV/Text::CSV_XS/g on the source file, and it should work the same.

      I had problems with both those modules when I tried them. The rules for escaping quotes and spaces within a CSV record are very clear, but the modules don't handle all the cases correctly. Every data file would die on some record.. that was legal, but which the modules wouldn't handle correctly. I ended up writing my own state-machine parser for CSV, first splitting the record into characters etc, just as I would do it in C.. it took me a couple of hours to code and test, but as I had already wasted many days messing with broken modules I considered it time well spent. -Ben M
Re (tilly) 1: Text::CSV
by tilly (Archbishop) on Apr 03, 2001 at 07:53 UTC
    I should get a CPAN ID, etc and put this there. But in the meantime Text::xSV is a plausible solution to this problem in pure Perl.
Re: Text::CSV
by TStanley (Canon) on Jan 13, 2001 at 03:25 UTC
    UPDATE:

    One problem that occured to me later on in this script, is that I didn't
    check to see if the $Outfile and $Junkfile existed. When I wrote this script,
    I had already touched those filenames, so I knew they existed, and didn't
    have to check for them. So I now add the following code:

    if(!-e $Outfile){ system("touch $Outfile"); } system("touch $Junkfile");

    TStanley
    In the end, there can be only one!
Re: Text::CSV
by wilmer_t (Novice) on Aug 20, 2014 at 13:33 UTC
    It's true that Text::CSV does not directly handle multiline fields, but I found it quite easy to get around
    My task was to export/import a set of requirements between tools, for which purpose I (amongst others) needed to add a field and sort the input recursively by parent reference (using Sort::Naturally)

    On reading, I created multilines as

    while ( my $line = $csv->getline($fh) ) { # Manipulation of unsorted fields during import my @multiline = join(';',@{$line}); push @content, @multiline ; } if (not $csv->eof) { $csv->error_diag(); }

    and on printing back, I entered data field by field as
    for my $line (@result) { my @fields = split(/;/,$line); # Manipulation of sorted fields during export print $fh "$_;" foreach @fields; print $fh "\n"; }
    with csv object created as $csv = Text::CSV->new({ binary => 1, auto_diag => 1, sep_char => ';' });

    I.e. I never bothered to try using the module for writing back data, nor other suggestions as from above :)

      It's true that Text::CSV does not directly handle multiline fields

      What is your definition of "multiline fields"? Because Text::CSV does handle fields with embedded newlines. In your output code, you could have used Text::CSV's print method. (By the way, why split $line only to join it again immediately afterwards?)

      Note that the last post in this thread was ~10 years ago, a lot has changed since then.

        Thanks for feedback, Mr Anon :)
        Multiline, meaning embedded newlines, yes - and you are right that the post is old, but since I am closing down a project, browsing through some perl hacks I found this thread and thought it worth while to comment on just the same.
        You might be right though that I was not installing the latest Text::CSV when making this script

        As for the method at parsing and printing, well I had to do some field manipulation both during import (unsorted) and export (sorted) for which I thought my example could be of use to someone else, struggling at CSV file manipulations :)
        Updated the examples accordingly, to better illustrate this