Writing CSV files

EyeOpener has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Writing CSV files by tachyon (Chancellor) on Oct 28, 2004 at 23:24 UTC
Seems like a somewhat strange requirement. Presumably the application is dumb as a box of hammers and is quoting any internal *char type. Essentially you need to remember if a given field was quoted or not which is typically information that CSV parsers discard. You really need a specialised split that returns array refs for the fields where the first element is the field data and the second the quote status. Here is the basic idea: `#!/usr/bin/perl -w use strict; while( my $data = <DATA> ) { chomp($data); next unless $data; my $fields = get_fields( $data ); $fields->[rand(3)]->[0] = 'newval'; my $str = remap_fields( $fields ); print "$str\n"; } sub get_fields { [ map{ s/^"\|"$//g ? [ $_, 1 ] : [ $_, 0 ] } split /(?<=\S),(?=\S)/ +, $_[0] ]; } sub remap_fields { join ',', map{ $_->[1] ? qq!"$_->[0]"! : $_->[0] } @{$_[0]}; } __DATA__ "hi",3,20.6,"green","32" 16,"alpha",0.00` [download] The get_fields() function may or may not work for you. The logic it uses not to split on embedded commas is to only split on \S,\S, This is making the invalid assumption that embeded commas probably look, like, this ie have spaces. You may have no embedded commas or the assumption may be true. If it is not you will need a better parser. cheers tachyon	[reply] [d/l]
Re: Writing CSV files by Joost (Canon) on Oct 28, 2004 at 23:14 UTC
I've never worked with this kind of weird CSV format, but both Text::xSV and Text::CSV_XS handle quoting better than Text::CVS, so you should probably try those. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re^2: Writing CSV files by pg (Canon) on Oct 28, 2004 at 23:29 UTC
It is not really that weird. Some of the application that is type-aware, do generate CSV file like this, especially database applications, adn Text::CSV is ready to handle this, both in and out.	[reply]
Re: Writing CSV files by ikegami (Patriarch) on Oct 28, 2004 at 23:32 UTC
Like this? `sub dequote { local $_ = @_ ? $_[0] : $_; return $_ unless (s/^"(.)"$/$1/s); s/\\(.)/$1/gs; return $_; } my @data; while (<DATA>) { chomp; my @fields; $_ = $_ . ','; push(@fields, $1) while (/\G("(?:[^"\\]\|\\.)"\|[^,]*)(?:,)/g); @fields = map { dequote } @fields; push(@data, \@fields); } require Data::Dumper; print(Data::Dumper::Dumper(\@data)); __DATA__ "hi",3,20.6,"green","32" 16,"alpha",0.00` [download]	[reply] [d/l]
Re: Writing CSV files by dragonchild (Archbishop) on Oct 29, 2004 at 00:38 UTC
I'm pretty sure, having read the code, that you can subclass Text::xSV to do what you need. Or, rather, since tilly didn't write it to be subclassed, you might have to do some sort of redefinition of some of the functions, but it shouldn't be that painful. If you want some help, give us a working example of what you need and we can help figure it out. Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply]
Re: Writing CSV files by pg (Canon) on Oct 28, 2004 at 23:27 UTC
With Text::CSV, if you want to embed double-quote, represent it by a pair of consecutive double-quotes.	[reply]
Re: Writing CSV files by steves (Curate) on Oct 29, 2004 at 09:50 UTC
The default behavior for Text::CSV_XS works as you want things to. Example: `use strict; use Text::CSV_XS; use IO::File; my $csv = Text::CSV_XS->new(); my @row; while (<DATA>) { chomp; @row = split(/\\|/, $_); $csv->print(\*STDOUT, \@row); print "\n"; } __DATA__ Field1\|Field Number Two\|Field Number Three\|Field4 Another One\|foo\|fubar\|foo and fubar` [download] produces: `Field1,"Field Number Two","Field Number Three",Field4 "Another One",foo,fubar,"foo and fubar"` [download] I've found Text::CSV_XS to be the most accurate at generating true CSV, versus just delimited by commas, which some people seem to think is CSV.	[reply] [d/l] [select]
Re^2: Writing CSV files by tachyon (Chancellor) on Oct 29, 2004 at 12:03 UTC
The default behavior for Text::CSV_XS works as you want things to. Actually what you have shown and what Text::CSV_XS does has no relationship whatsoever to what the OP wanted.	[reply]
Re^3: Writing CSV files by steves (Curate) on Oct 29, 2004 at 13:38 UTC
You are correct tachyon. It took me three cups of coffee to see that. My apologies. The issue here is rather custom since none of the CSV packages I've used will tell you if something was quoted coming in -- they hide that. It seems like to do this right you need to know what was quoted coming in and mirror that on the way out. If the rules are based on content (e.g., something that can be regexp matched) then maybe you don't need to know how it was originally.	[reply]
Re: Writing CSV files by jZed (Prior) on Oct 29, 2004 at 15:32 UTC
As far as I can tell, Text::CSV can handle all of this except writing out with quotes the way I need. It either writes everything with quotes or nothing. Text::CSV_XS takes a different approach. By default it quotes only fields that need to be quoted for parsability i.e. it quotes fields that have embedded separator or delimiter characters and does not quote fields that do not contain them i.e. foo,7,bar,"x,y",qux. You can also specify always quote to get quote characters delimiting all fields. You might want to examine your data to make sure that isn't what is happening. If your fields are not quoted according to their parseability, then it is most likely that they are quoted by field type (as understood by the original app) so that they will be consistent by field. In other words if fields 1, 4, and 5 are thought by the original app to contain strings, then fields 1, 4, and 5 will be quoted. Here's a script which will honor the original apps' field definitions - it reads CSV fields (including embedded commas) and writes out the fields quoted as in the original. It uses AnyData, another CSV-capable module. The example uses a DATA section, but the module also supports reading from files. `#!/usr/bin/perl -w use strict; use AnyData; my %is_quoted_field = ( c1=>1, c4=>1, c5=>1 ); my @cols = qw ( c1 c2 c3 c4 c5 c6 ); my $table = adTie('CSV',[<DATA>],'r',{cols=>[@cols]}); while (my $row = each %$table) { my @fields; for (@cols) { my $field = ($is_quoted_field{$_}) ? q{"} . $row->{$_} . q{"} : $row->{$_}; push @fields,$field; } print join(',',@fields)."\n"; } __DATA__ "bye",3,27.6,"green","32",stuff "x,y",5,28.2,"yellow","33",more stuff` [download]	[reply] [d/l]
Re: Writing CSV files by TilRMan (Friar) on Oct 29, 2004 at 15:07 UTC
Here's a quick subclass of Text::CSV_XS to return you an opaque $quote object from getline() that you pass to print(). Two gotchas though: First, commas embedded in a field screw it up. Second, I only implemented new, getline(), and print(); not combine() or parse() or anything else. Read more... (2 kB) If the strings can have commas, I don't know if Text::CSV_XS can help you.	[reply] [d/l]
Re: Writing CSV files by EyeOpener (Scribe) on Oct 29, 2004 at 17:22 UTC
Many thanks to tachyon, TilRMan, jZed and others for the helpful responses! Yes, this is an ancient ('70's era) app that has its own bizarre (but consistent) logic for maintaining these files. Unfortunately, it does allow for both commas and quote characters within quoted text fields, which is why I didn't want to get into the business of writing my own parser. As I mentioned, I do have file layouts detailing the datatypes of each field I'm going to encounter. I think the approach I'll take will be to read the file layouts into a data structure, and re-quote the text fields when they're in an array. That way I can still use the well-tested Text::CSV or Text::CSV_XS for parsing and rewriting the CSVs. It'll take more work, but that's probably the most bulletproof and flexible approach. Again, thanks! -E	[reply]