Re: Replace commas with spaces between quotes, parsing CSV

Replies are listed 'Best First'.
Re^2: Replace commas with spaces between quotes, parsing CSV by Marshall (Canon) on Apr 15, 2016 at 19:28 UTC
I ran this on the OP's data set and it looks like it gives the correct answer, but also throws a couple of warnings in the process. I haven't quite figured out why? Update, new code, try to eliminate warnings... Original code... Read more... (3 kB) New version with OP's input line: #!usr/bin/perl use warnings; use strict; $\|=1; my $line = '1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U" +,16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, +PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D00 +1EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512 +050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 +9:30'; my $i=0; my $line2 = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; print "line 1=$line\n"; print "line 2=$line2\n"; my $j=0; # recoded into a slightly different way... my @csv; foreach (split /"/,$line) { s/,/ /g if $j++ % 2; push @csv, $_; } print "\n\n$line\n"; print @csv,"\n"; __END__ first version, 2 warnings.... Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. line 1=1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,I +FC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE" +,4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF44 +9E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D51205052 +4F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 line 2=1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC + 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/2 +8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A +B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44 +5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 2nd version no warnings.... 1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,IFC 8112 +NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/28/2 +015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97 +F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F5 +04F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC 8112NP +,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/28/2015, +1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97F0B9 +8C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F504F5 +331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 Process completed successfully [download] Of course @csv chould be joined back together since it was split on '"' instead of ',' and then re-split on comma. This algorithm does appear to work when reformulated and does its job of getting rid of commas within double quotes without run-time warnings. Replacing the "," with a space might not be the "best". possible improvement is to reduce multiple spaces to single spaces.	[reply] [d/l] [select]
Re^3: Replace commas with spaces between quotes, parsing CSV by BigRedEO (Acolyte) on Apr 15, 2016 at 19:43 UTC
I had tried throwing his line in my script here - `my $filename = 'tested.csv'; open my $FH, $filename or die "Could not read from $filename <$!>, program halting."; # Read the header line. chomp(my $line = <$FH>); my $i = 0; $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; my @fields = split(/,/, $line); print Dumper(@fields), $/; my @data; # Read the lines one by one. while($line = <$FH>) {` [download] and got this when I tried to run the script - `Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5.` [download]	[reply] [d/l] [select]
Re^3: Replace commas with spaces between quotes, parsing CSV by BigRedEO (Acolyte) on Apr 15, 2016 at 20:02 UTC
Curious - what is the line with the pipe here doing - $\|=1;	[reply]
Re^4: Replace commas with spaces between quotes, parsing CSV by Marshall (Canon) on Apr 15, 2016 at 20:21 UTC
The line $\|=1 turns off buffering on STDOUT. I always use that when debugging code. STDERR is unbuffered by default and STDOUT is buffered by default. What that means is that when writing to STDOUT, you may have to write say 4Kbytes of data before it actually gets output to the screen or the hard disk. This is what you want for the program to run fast. However, that means that the unbuffered STDERR messages come out immediately and "out of order" with the STDOUT printout. In other words, it can happen that the error message appears first and then much later the stuff associated with the error appears. There is a big performance penalty for using $\|=1, especially to the HD. The disk system works most efficiently with "blocks". A typical set up will be 512 byte fundamental "writeable" units on the HD, these are grouped together 8 at a time, to make a 4K block. And that is what is used to write to the disk and what is tracked by the directory system. $\|=1 will force a write for every "print" statement instead of have this buffered up to some larger number that the I/O system would rather deal with. You will find that even a one bye text file takes a minimum size unit on the disk. Update with a demo: `#!/usr/bin/perl use warnings; use strict; my $x = undef; print "asdwf\n"; print "n,zxncv\n"; print "$x\n"; =Prints Use of uninitialized value $x in concatenation (.) or string at C:\Pro +jects_Perl\demounbuffer.pl line 8. asdwf n,zxncv =cut $\|=1; $x = undef; print "asdwf\n"; print "n,zxncv\n"; print "$x\n"; =Prints asdwf n,zxncv Use of uninitialized value $x in concatenation (.) or string at C:\Pro +jects_Perl\demounbuffer.pl line 20. =cut` [download] In the first code, the error message comes out first although this is the third print statement!!! In the second code (the same code, but with $\|=1;), the error comes out 3rd in the print order! That more realistically presents the time line order of what actually happened. I also show another "Perl trick", I used the PerlDoc feature to insert some explanation into the code. This is normally used to insert documentation at the beginning of the code, but I "cheated".	[reply] [d/l]
Re^2: Replace commas with spaces between quotes, parsing CSV by BigRedEO (Acolyte) on Apr 15, 2016 at 18:51 UTC
So how do I use that within my code here? Something like: `my $i = 0; $_ = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/ for @fields +[7,19];` [download] Or is that not correct?	[reply] [d/l]
Re^3: Replace commas with spaces between quotes, parsing CSV by GotToBTru (Prior) on Apr 15, 2016 at 18:58 UTC
You'll want to apply this function to $line before you split it into @fields. `while(...) chomp($line); $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; @fields = split(/,/,$line);` [download] But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)	[reply] [d/l]
Re^4: Replace commas with spaces between quotes, parsing CSV by BigRedEO (Acolyte) on Apr 15, 2016 at 19:22 UTC
Does not seem to be working. I am using a test file with only five lines in it for now (the actual file will have millions of records). Here is the error I'm getting using your line - Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5. Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5. Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5. Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5. Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5. Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5. Error parsing time at /usr/perl5/5.12/lib/sun4-solaris-64int/Time/Piec +e.pm line 470, <$FH> line 5. [download] And here is the entirety of my script currently with your line added - #!/usr/bin/perl/ use strict; use warnings; use Data::Dumper; use Time::Piece; my $filename = 'tested.csv'; open my $FH, $filename or die "Could not read from $filename <$!>, program halting."; # Read the header line. chomp(my $line = <$FH>); my $i = 0; $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; my @fields = split(/,/, $line); print Dumper(@fields), $/; my @data; # Read the lines one by one. while($line = <$FH>) { # split the fields, concatenate the first three fields, # and add it to the beginning of each line in the file chomp($line); my @fields = split(/,/, $line); unshift @fields, join '_', @fields[0..2]; $_ = join '-', (split /\//)[2,0,1] for @fields[14,20,23]; $_ = Time::Piece->strptime($_,'%m/%d/%Y %H:%M')->strftime('%Y-%m-% +d %H:%M') for @fields[38,39]; push @data, \@fields; } close $FH; print "Unsorted:\n", Dumper(@data); #, $/; @data = sort { $a->[0] cmp $b->[0] \|\| $a->[20] cmp $b->[20] \|\| $a->[23] cmp $b->[23] \|\| $a->[26] cmp $b-> [26] } @data; open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMist +ints.csv'; print $OFH join(',', @$_), $/ for @data; close $OFH; exit; [download]	[reply] [d/l] [select]