swapping PIPE for comma in CSV file

dwhite20899 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: swapping PIPE for comma in CSV file by Tux (Canon) on Jun 26, 2007 at 13:31 UTC
You could use Text::CSV_XS with two objects, and do it safe (untested): `use strict; use warnings; use IO::Handle; use Text::CSV_XS; my $csv_in = Text::CSV_XS->new ({ binary => 1, }); my $csv_out = Text::CSV_XS->new ({ binary => 1, sep_char => "\|", escape_char => "\\", eol => "\n", }); while (my $row = $csv_in->getline (ARGV)) { $csv_out->print (STDOUT, $row); }` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^2: swapping PIPE for comma in CSV file by sgt (Deacon) on Jun 27, 2007 at 14:45 UTC
For the OP just to emphasize the importance of binary in the case you have a def. of CSV that permits embedded newlines. For Merijn. I was going to answer more or less the same to the OT yesterday, but came across a few problems, that made me reinstall the latest versions... One problem was that I used IO::Wrap objects for stdin and stdout and they don't work with the pure perl version, I am not sure why. Maybe it would be better to load IO::Handle directly and have something for those who want efficiency. In this thread I wanted to test the pure perl version as installing an XS module could have been problematic for thew OP. I think that keeping in sync both versions is important... for some reason search.cpan.org gives the version 0.29 Text::CSV_XS but perl -MCPAN -e install qw(Text::CSV_XS)' installs 0.30 the right one I believe if I remember your post on p5p or pm. % steph@apexPDell2 (/home/stephan) % % cat conv_comma2pipe_xs.px #!/usr/bin/perl use strict; use warnings; $\|++; #use IO::Handle; use IO::Wrap; use Text::CSV_XS; # use DDS; # my $in = IO::Wrap::wraphandle(\STDIN) or die; # my $out = IO::Wrap::wraphandle(\STDOUT) or die; # Dump\($in, $out); my $csv_in = Text::CSV_XS->new({ binary => 1, }) or die; my $csv_out = Text::CSV_XS->new({ binary => 1, sep_char => q{\|}, eol => qq{\n}, }) or die; while (defined (my $rec = $csv_in->getline(\STDIN)) ) { { my @fields = @$rec; local $"=q{][}; print {\STDERR} ".rec [@fields]\n"; } $csv_out->print(\STDOUT, $rec); } __END__ % steph@apexPDell2 (/home/stephan) % % cat hi1.csv \| perl+ -w conv_comma2pipe_xs.px .rec [a][b][c] a\|b\|c .rec [a][okay, comma][c] a\|"okay, comma"\|c .rec [a][long line, indeed][end] a\|"long line, indeed"\|end % steph@apexPDell2 (/home/stephan) % % cat hi1.csv a,b,c a,"okay, comma",c a,"long line, indeed",end [download] cheers --stephan p.s I tested on cygwin with perl 5.8.7 and 5.8.8 update:* oops forgot the code...	[reply] [d/l]
Re^3: swapping PIPE for comma in CSV file by Tux (Canon) on Jun 28, 2007 at 07:00 UTC
One problem was that I used IO::Wrap objects for stdin and stdout and they don't work with the pure perl version, I am not sure why. Maybe it would be better to load IO::Handle directly and have something for those who want efficiency. In this thread I wanted to test the pure perl version as installing an XS module could have been problematic for thew OP. I think that keeping in sync both versions is important... The maintainer of Text::CSV_PP is doing a real nice job in trying to keep it in sync with Text::CSV_XS and we do have (a lot) of contact about that. I already had a look at version 1.06, and it passed all tests for 0.30, except the diagnostics tests, which is logical and explainable. That maintainer also got the maintainership for the very old Text::CSV, which will be a wrapper around Text::CSV_XS and Text::CSV_PP and choose the one available, based on a method used in DBI::PurePerl: the environment variable `TEXT_CSV_XS`, and will default to the fastest method available. I have been thinking about the use of IO::Handle, making it either default, or `use`'d automatically, but everything I came up with so far will imply a slowdown, which is IMHO unacceptable. I't a bit of a shame that this is a relative expensive module to load (14 kb of source code). for some reason search.cpan.org gives the version 0.29 Text::CSV_XS but perl -MCPAN -e install qw(Text::CSV_XS)' installs 0.30 the right one I believe if I remember your post on p5p or pm. Maybe I've been working too hard lately on this module, and uploaded too many versions :) Give CPAN some time to sync around the world. I'll have a look at the IO::Wrap thingy Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: swapping PIPE for comma in CSV file by tirwhan (Abbot) on Jun 26, 2007 at 13:45 UTC
Don't reinvent the wheel, use Text::CSV_XS: #!/usr/bin/perl use warnings; use strict; use Text::CSV_XS; my $csv = Text::CSV_XS->new(); while (<DATA>) { my $line = $csv->parse($_); print join("\|",$csv->fields)."\n"; } __DATA__ "00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3 +C9786A","05E566DF","J0180794.JPG",32768,3290,"WIN","" "000005EE5E3F6961B78CE4549270DE5D05CBC0CB","8D025B6AE1994A40FCBB5AEC2E +F273F9","5E8D7D42","WabIab, and more.bor",4760,4616,"WIN","" "0000085FC602CD8AD4793A874A47D286DACB0F6A","8BA8BC04896C421A704282E9B8 +7B5520","8D89A85D","fpSDt,Finder,Link.gif",1161,2988,"Solaris","" "00000FF9D0ED9A6B53BC6A9364C07074DE1565F3","A5D49D6DA9D78FD1E7C32D58BC +7A46FB","2D729A1E","cmnres,pdb.dll",76800,1550,"WIN","" [download] OTOH, why do this at all? Commas within quotation marks are common enough in CSV files and (as you've found out) are easy enough to handle. All dogma is stupid.	[reply] [d/l]
Re^2: swapping PIPE for comma in CSV file by dwhite20899 (Friar) on Jun 26, 2007 at 15:36 UTC
Nice. I'll have to dig into CSV_XS. Yeah, I can handle it as is, but I'm getting asked to convert it for others who can't. It's due to a legacy spec that wasn't the greatest, but whatcha gonna do? (I got them to change the spec to put the three static width columns first, and paid for that with a lot of blood - not gonna ask for more)	[reply]
Re: swapping PIPE for comma in CSV file by pseudomonas (Monk) on Jun 26, 2007 at 13:24 UTC
If your data is always going to have that number of fields, you can just work from the front and the back and comma-join all the bits in the middle. `perl -F, -lane'chomp;print join("\|",@F[0..2], join(",", @F[3..$#F-4]),@F[-3..-1])' A/NSRLFile.txt`	[reply] [d/l]
Re^2: swapping PIPE for comma in CSV file by dwhite20899 (Friar) on Jun 26, 2007 at 15:30 UTC
Yeah - that's what I do now, in a VERY long-winded way. I didn't know about @F . Cool!	[reply]
Re: swapping PIPE for comma in CSV file by johngg (Canon) on Jun 26, 2007 at 14:59 UTC
Using split with a limit along with reverse seems to be a good way to isolate the fields in this particular case. However, I would agree with some other Monks recommendations to use a module in most circumstances. I have cut down the length of the fields to avoid too much line wrapping. `use strict; use warnings; while ( <DATA> ) { my @f012 = split m{,}, $_, 4; my $rest = pop @f012; my @f7654 = map { my $r = reverse } split m{,}, reverse($rest), 5; my $f3 = pop @f7654; print join q{\|}, @f012, $f3, reverse @f7654; } __END__ "13F2","E3C9","05E5","J0180794.JPG",32768,3290,"WIN","" "D05C","2EF2","5E8D","WabIab, and more.bor",4760,4616,"WIN","" "6DAC","B87B","8D89","fpSDt,Finder,Link.gif",1161,2988,"Solaris","" "4DE1","BC7A","2D72","cmnres,pdb.dll",76800,1550,"WIN",""` [download] The output. `"13F2"\|"E3C9"\|"05E5"\|"J0180794.JPG"\|32768\|3290\|"WIN"\|"" "D05C"\|"2EF2"\|"5E8D"\|"WabIab, and more.bor"\|4760\|4616\|"WIN"\|"" "6DAC"\|"B87B"\|"8D89"\|"fpSDt,Finder,Link.gif"\|1161\|2988\|"Solaris"\|"" "4DE1"\|"BC7A"\|"2D72"\|"cmnres,pdb.dll"\|76800\|1550\|"WIN"\|""` [download] I hope this is of interest. Cheers, JohnGG Update: There's no need for the lexical in the `map`, `$_` will do, and there's no need to `pop` off `$f3` as we are not doing anything with it. Code becomes `while ( <DATA> ) { my @f012 = split m{,}, $_, 4; my $rest = pop @f012; my @f76543 = map { $_ = reverse } split m{,}, reverse($rest), 5; print join q{\|}, @f012, reverse @f76543; }` [download] Update 2: Using scalars for the first `split` means the `pop` can be avoided and incorporate all the rest in the `print` statement. Code becomes `while ( <DATA> ) { my ($f0, $f1, $f2, $rest) = split m{,}, $_, 4; print join q{\|}, $f0, $f1, $f2, reverse map { $_ = reverse } split m{,}, reverse($rest), 5; }` [download]	[reply] [d/l] [select]
Re: swapping PIPE for comma in CSV file by pseudomonas (Monk) on Jun 26, 2007 at 13:37 UTC
Alternatively, you can always use regexes. This is being silly, though. `perl -lane'sub f{s/^([^,]+),/$1\|/} sub b{s/,([^,]+)$/\|$1/} f;f;f;b;b;b;b; print' A/NSRLFile.txt` Like another commenter suggested, though, I'd use a module unless you really have a good reason not to.	[reply] [d/l]
Re: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 26, 2007 at 14:12 UTC
I generally don't have the luxury of using anything not mentioned on perlfunc. This works for me without any special functions: #!/usr/bin/perl -w use strict; { print join ('\|', (/(^"[^"]"),("[^"]"),("[^"]"),("[^"]"),([0-9] +),([0-9]),("[^"]*")/), "\"\"\n") while (<DATA>); } __DATA__ "00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3 +C9786A","05E566DF","J0180794.JPG",32768,3290,"WIN","" "000005EE5E3F6961B78CE4549270DE5D05CBC0CB","8D025B6AE1994A40FCBB5AEC2E +F273F9","5E8D7D42","WabIab, and more.bor",4760,4616,"WIN","" "0000085FC602CD8AD4793A874A47D286DACB0F6A","8BA8BC04896C421A704282E9B8 +7B5520","8D89A85D","fpSDt,Finder,Link.gif",1161,2988,"Solaris","" "00000FF9D0ED9A6B53BC6A9364C07074DE1565F3","A5D49D6DA9D78FD1E7C32D58BC +7A46FB","2D729A1E","cmnres,pdb.dll",76800,1550,"WIN","" [download] It marks everything in a field, with the fourth field (third if zero based) being the variable-length one. I'm sure you could break it down even further using only perlfuncs but it's a working solution based on the provided input. This is my first post to perlmonks btw but I'm a longtime perl user.	[reply] [d/l]
Re: swapping PIPE for comma in CSV file by LighthouseJ (Sexton) on Jun 27, 2007 at 14:24 UTC
I posted the solution above with only using perlfunc. I just registered this login today. I found yet another solution, which is radically different from anything else I've seen. I got to thinking about my other comment correcting someone else that we should be concerned less with the commas and more with the quotes because they provide a more concrete "rule" for delimiting the items. I was writing an array definition for another file and noticed the definition was identical to the provided data, so I wrote a program which simply evals the data into an array, then prints it out in the right format. I'm sure somebody can write something to clean up the output and make it more elegant but the eval does a bang-up job of reading the data properly. #!/usr/bin/perl -w use strict; { while (<DATA>) { my @i = (); eval "\@i = ($_);"; print "\"$i[0]\"\|\"$i[1]\"\|\"$i[2]\"\|\"$i[3]\"\|$i[4]\|$i[5]\|\"$i[6] +\"\|\"$i[7]\"\n"; } } __DATA__ "00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3 +C9786A","05E566DF","J0180794.JPG",32768,3290,"WIN","" "000005EE5E3F6961B78CE4549270DE5D05CBC0CB","8D025B6AE1994A40FCBB5AEC2E +F273F9","5E8D7D42","WabIab, and more.bor",4760,4616,"WIN","" "0000085FC602CD8AD4793A874A47D286DACB0F6A","8BA8BC04896C421A704282E9B8 +7B5520","8D89A85D","fpSDt,Finder,Link.gif",1161,2988,"Solaris","" "00000FF9D0ED9A6B53BC6A9364C07074DE1565F3","A5D49D6DA9D78FD1E7C32D58BC +7A46FB","2D729A1E","cmnres,pdb.dll",76800,1550,"WIN","" [download] Hopefully this is useful to somebody. "The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for why." -- `man perl`	[reply] [d/l]
Re^2: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 27, 2007 at 22:05 UTC
oooo, oooo... `perl -wlpe "s/(\042[^\042]?\042)/'$1'/g; $_ = join '\|', eval qq/($_)/" NSRLFile.csv > NSRLfile.psv` [download] see previous post for definitions of NSRLFile. files. \042 is a " char.	[reply] [d/l]
Re: swapping PIPE for comma in CSV file by jwkrahn (Abbot) on Jun 26, 2007 at 14:14 UTC
$ echo '"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D629 +96392FE3C9786A","05E566DF","J0180794.JPG",32768,3290,"WIN","" "000005EE5E3F6961B78CE4549270DE5D05CBC0CB","8D025B6AE1994A40FCBB5AEC2E +F273F9","5E8D7D42","WabIab, and more.bor",4760,4616,"WIN","" "0000085FC602CD8AD4793A874A47D286DACB0F6A","8BA8BC04896C421A704282E9B8 +7B5520","8D89A85D","fpSDt,Finder,Link.gif",1161,2988,"Solaris","" "00000FF9D0ED9A6B53BC6A9364C07074DE1565F3","A5D49D6DA9D78FD1E7C32D58BC +7A46FB","2D729A1E","cmnres,pdb.dll",76800,1550,"WIN",""' \| \ perl -nle'print join q/"\|"/, split /","/' "00000142988AFA836117B1B572FAE4713F200567"\|"9B3702B0E788C6D62996392FE3 +C9786A"\|"05E566DF"\|"J0180794.JPG",32768,3290,"WIN"\|"" "000005EE5E3F6961B78CE4549270DE5D05CBC0CB"\|"8D025B6AE1994A40FCBB5AEC2E +F273F9"\|"5E8D7D42"\|"WabIab, and more.bor",4760,4616,"WIN"\|"" "0000085FC602CD8AD4793A874A47D286DACB0F6A"\|"8BA8BC04896C421A704282E9B8 +7B5520"\|"8D89A85D"\|"fpSDt,Finder,Link.gif",1161,2988,"Solaris"\|"" "00000FF9D0ED9A6B53BC6A9364C07074DE1565F3"\|"A5D49D6DA9D78FD1E7C32D58BC +7A46FB"\|"2D729A1E"\|"cmnres,pdb.dll",76800,1550,"WIN"\|"" [download]	[reply] [d/l]
Re^2: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 26, 2007 at 14:22 UTC
You forgot what I initially forgot, and that's the numbers that don't have quotes. I think the end one line should look like: `"\|"WabIab, and more.bor"\|4760\|4616\|"WIN"\|""` [download] instead of: `"\|"WabIab, and more.bor",4760,4616,"WIN"\|""` [download] But somebody can feel free to correct me if I'm wrong.	[reply] [d/l] [select]
Re: swapping PIPE for comma in CSV file by Ploux (Acolyte) on Jun 26, 2007 at 15:05 UTC
Why not this? `s/([0-9"]),([0-9"])/$1\\|$2/g` [download]	[reply] [d/l]
Re^2: swapping PIPE for comma in CSV file by pseudomonas (Monk) on Jun 26, 2007 at 15:12 UTC
An embedded comma might sit between two numerals or at the beginning or end of the field next to the quotes.	[reply]
Re^3: swapping PIPE for comma in CSV file by Ploux (Acolyte) on Jun 26, 2007 at 15:40 UTC
Somehow I knew it couldn't be that easy.	[reply]
Re: swapping PIPE for comma in CSV file by girarde (Hermit) on Jun 26, 2007 at 15:15 UTC
I dunno, assuming that the quotes aren't still desired I'd split on `/","/`, join on `\|`, and trim the trailing and leading quotes. Maybe I'm missing something here.	[reply] [d/l] [select]
Re^2: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 26, 2007 at 15:44 UTC
A few people missed this (myself included). You have to split on a pair of double quotes or no quotes at all. There's the field with commas in it which you can't replace with a pipe so a simple regex or split/join won't work (at least without a massive regex on the split).	[reply]
Re^3: swapping PIPE for comma in CSV file by girarde (Hermit) on Jun 26, 2007 at 16:59 UTC
Right, but I'm splitting on the 'quote, comma, quote' pattern, so embedded commas sail through. You can `split` on a pattern, although `join` wants a string.	[reply] [d/l] [select]
Re^4: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 26, 2007 at 17:06 UTC
Re^5: swapping PIPE for comma in CSV file by girarde (Hermit) on Jun 28, 2007 at 13:36 UTC
Re: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 26, 2007 at 20:49 UTC
all you need is not_quote-comma-quote, quote-comma-quote, and quote-comma-not_quote: `echo <DATA> \| perl -F'(?<=[^"]),(?=")\|(?<="),(?=")\|(?<="),(?=[^"])' -lane 'print join("\|",@F)'` though i could be wrong, i just learned about look-ahead/behind today --jd	[reply] [d/l]
Re: swapping PIPE for comma in CSV file by Anonymous Monk on Jun 27, 2007 at 09:20 UTC
i would strongly agree with others who have recommended use of a standard, well tested csv-parsing module. however, if you really have to go commando (and do it on the command line to boot), the following (tested) code should work. note that: NSRLFile.csv contains the example data from the op NSRLFile.psv contains the processed, pipe character-separated data the code makes no assumptions about the absence or presence whitespace around the value-separating commas or at the beginning or end of records double-quoted strings can contain anything, even backslash-escaped double-quotes (or any other backslash-escaped character) the only value supported other than a double-quoted string is a simple numeric with NO sign, decimal point or embedded commas the code makes no assumptions about the order of the fields in a record had to use \042 in place of a double-quote character because otherwise, the xp command-line interpreter gets confused and thinks the first pipe character it sees (in a regex) is an actual command-line pipe operator `perl -wlpe "use strict; my $quo = qq(\042); my $esc = qq(\\\\); my $body = qr/[^$quo$esc]/; my $escaped = qr/$esc ./x; my $sep = qr/ \s* , \s* \| \s* \Z /x; my $quoted = qr( $quo $body* (?: $escaped $body* )* $quo )x; my $number = qr(\d+); $_ = join '\|', m/( $quoted \| $number ) (?= $sep )/gx" NSRLFile.csv > NSRLfile.psv` [download]	[reply] [d/l]
Re: swapping PIPE for comma in CSV file by Moron (Curate) on Jun 27, 2007 at 10:23 UTC
I don't see why you can't just s/\",\"/\"\|\"/g - embedded commas won't mess that up - even a single "," field won't mess up because the "," match at separator preceding it will take precedence and s/ will subsequently position after the executed substition at ,",", causing the next match to occur also at the correct place. (Update: Tested it now on "fred",",","bert" and it worked). I am not going to jump on the "always use a module" bandwagon, because only the OPer can possibly know what the issues are surrounding that - I seem to have to keep saying it in here: it is unsafe to draw conclusions when potentially missing crucial information. __________________________________________________________________________________ ^M Free your mind!	[reply]
Re^2: swapping PIPE for comma in CSV file by tirwhan (Abbot) on Jun 27, 2007 at 10:44 UTC
I don't see why you can't just s/\",\"/\"\|\"/g Perhaps you should have taken a better look at the data in the original node then, or read the several answers that detail why that won't work in this case! (Hint: there are unquoted numbers in the data). Honestly, posting an answer to a day-old thread without even reading the preceding answers seems a complete waste of everyones time. All dogma is stupid.	[reply] [d/l]