BigRedEO has asked for the wisdom of the Perl Monks concerning the following question:
I've asked this question elsewhere, this after have tried many things I've found searching Google, but have not had luck yet. I will try here (hopefully I'll be much better at posting my question after having so many problems the first time I tried to post a question here earlier this week. I'm such a total noob) -
I have happened upon a problem with a program I'm writing that parses through a CSV file with a few million records: two fields in each record have comments that users have input, and sometimes they use commas within their comments. If there are commas input, that field will be contained in double quotes in the record. I need to replace any commas found in those fields with a space. Here is one such record from the file to give you an idea. -
1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
NOTE - I do not have the Text::CSV module available to me, nor will it be made available in the server I am using.
Here is part of my code in parsing this file. The first thing I do is concatenate the very first three fields and prepend that concatenated field to each line. Then I want to clear out the commas in @fields7,19 (the thing I'm having trouble figuring out), then format the DATE in three fields and the DATETIME in two fields. The only part I can't figure out is replacing those commas with a space. Is there a tr/,/ /; line or regex line or anything else that would work? -
my @data;
# Read the lines one by one.
while ( $line = <$FH> ) {
# split the fields, concatenate the first three fields,
# and add it to the beginning of each line in the file
chomp($line);
my @fields = split(/,/, $line);
unshift @fields, join '_', @fields[0..2];
# remove user input commas in fields 7 and 19
$_ = for fields[7,19];
# format DATE and DATETIME fields for MySQL/sqlbatch60
$_ = join '-', (split /\//)[2,0,1] for @fields[14,20,23];
$_ = Time::Piece->strptime($_,'%m/%d/%Y %H:%M')->strftime('%Y-
+%m-%d %H:%M') for @fields[38,39];
# write the parsed record back to the file
push @data, \@fields;
}
Re: Replace commas with spaces between quotes, parsing CSV
by GotToBTru (Prior) on Apr 15, 2016 at 18:27 UTC
|
More than likely, I learned this trick at PM.
$i=0;
$line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line;
while(<DATA>) {
print;
$i = 0;
print join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/;
print "\n";
}
__DATA__
one,two,three
one,"two,three,five",four
"one,two",three
one,two,"three,"
one,two,three
one,two,three
one,"two,three,five",four
one,two three five,four
"one,two",three
one two,three
one,two,"three,"
one,two,three
But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I ran this on the OP's data set and it looks like it gives the correct answer, but also throws a couple of warnings in the process. I haven't quite figured out why?
Update, new code, try to eliminate warnings...
Original code...
New version with OP's input line:
#!usr/bin/perl
use warnings;
use strict;
$|=1;
my $line = '1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U"
+,16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3,
+PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D00
+1EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512
+050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015
+9:30';
my $i=0;
my $line2 = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line;
print "line 1=$line\n";
print "line 2=$line2\n";
my $j=0; # recoded into a slightly different way...
my @csv;
foreach (split /"/,$line)
{
s/,/ /g if $j++ % 2;
push @csv, $_;
}
print "\n\n$line\n";
print @csv,"\n";
__END__
first version, 2 warnings....
Odd number of elements in anonymous hash at C:\Projects_Perl\handledou
+blequotecsv.pl line 9.
Odd number of elements in anonymous hash at C:\Projects_Perl\handledou
+blequotecsv.pl line 9.
line 1=1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,I
+FC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE"
+,4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF44
+9E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D51205052
+4F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
line 2=1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC
+ 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/2
+8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A
+B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44
+5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
2nd version no warnings....
1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,IFC 8112
+NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/28/2
+015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97
+F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F5
+04F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC 8112NP
+,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/28/2015,
+1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97F0B9
+8C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F504F5
+331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
Process completed successfully
Of course @csv chould be joined back together since it was split on '"' instead of ',' and then re-split on comma. This algorithm does appear to work when reformulated and does its job of getting rid of commas within double quotes without run-time warnings. Replacing the "," with a space might not be the "best". possible improvement is to reduce multiple spaces to single spaces. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
my $filename = 'tested.csv';
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
# Read the header line.
chomp(my $line = <$FH>);
my $i = 0;
$line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line;
my @fields = split(/,/, $line);
print Dumper(@fields), $/;
my @data;
# Read the lines one by one.
while($line = <$FH>) {
and got this when I tried to run the script -
Use of uninitialized value in join or string at AlterDataNew.pl line 2
+9, <$FH> line 5.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Curious - what is the line with the pipe here doing -
$|=1;
| [reply] [Watch: Dir/Any] |
|
|
So how do I use that within my code here? Something like:
my $i = 0;
$_ = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/ for @fields
+[7,19];
Or is that not correct? | [reply] [Watch: Dir/Any] [d/l] |
|
while(...)
chomp($line);
$line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line;
@fields = split(/,/,$line);
But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)
| [reply] [Watch: Dir/Any] [d/l] |
|
Re: Replace commas with spaces between quotes, parsing CSV
by AnomalousMonk (Archbishop) on Apr 16, 2016 at 01:16 UTC
|
I don't know if you have a satisfactory solution yet, but this seems to work. Note that the substitution is done on any and all double-quoted strings in a record, not specific ones. If this is not acceptable, then you already seem able to extract specific fields and the substitution can be done on those fields individually. Note also that the /r modifier of the tr///r version of the substitution is available only for Perl versions from 5.14 onward; the other tr/// expression works in any Perl version. (I've also used the __DATA__ block from poj's code.)
File repl_commas_in_csv_1.pl:
Update: I've just noticed that the AnonyMonk posted essentially the same idea here. I think the double-quote regex I'm using is a bit more robust than that one, but it still must be considered fragile compared to what may appear in a fully-fledged CSV record — a caveat that, I admit, I should have included in my original reply!
Give a man a fish: <%-{-{-{-<
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Replace commas with spaces between quotes, parsing CSV
by poj (Abbot) on Apr 15, 2016 at 18:58 UTC
|
#!perl
use strict;
use Time::Piece;
use Text::ParseWords;
use Data::Dumper;
my @data = ();
while (my $line = <DATA>){
my @fields = quotewords(',', 0, $line);
unshift @fields, join '_', @fields[0..2];
s/,//g for @fields[7,19];
$_ = join '-', (split /\//)[2,0,1] for @fields[14,20,23];
$_ = Time::Piece->strptime($_,'%m/%d/%Y %H:%M')->strftime('%Y-%m-%d
+%H:%M') for @fields[38,39];
push @data,\@fields;
}
print Dumper \@data;
__DATA__
1925,47365,2,650187016,1,1,"MADE, FOR, DRAWDOWNS, NEVER, P/U",16,IFC 8
+112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/2
+8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A
+B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44
+5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
poj | [reply] [Watch: Dir/Any] [d/l] |
|
Except that has appeared to put commas in every space in the quoted fields? Rather than replace any commas in the quoted fields with spaces?
| [reply] [Watch: Dir/Any] |
Re: Replace commas with spaces between quotes, parsing CSV
by bart (Canon) on Apr 16, 2016 at 04:28 UTC
|
Here's mine, a different approach, based on the idea "replace what you want to keep by itself":
s/(,)|(".*?")/ $1 || $2 =~ s(,)( )gr /ge;
This requires a "fairly recent" perl, though I'm unsure exactly when s///r was introduced.
For older (and for newer) perls, you can use the slightly more verbose:
s/(,)|(".*?")/ $1 || do { (my $s = $2) =~ s(,)( )g; $s } /ge;
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Replace commas with spaces between quotes, parsing CSV
by jellisii2 (Hermit) on Apr 15, 2016 at 18:58 UTC
|
I'd strongly recommend using one of the fine CSV parsing modules to do this for you. | [reply] [Watch: Dir/Any] |
|
Unfortunately, I do not seem to have any of the "Text" modules, nor will I have the option to download any of them.
| [reply] [Watch: Dir/Any] |
|
Text::CSV has a pure Perl version. If you can upload your program to your server, you can also upload the module.
| [reply] [Watch: Dir/Any] |
|
|
|
Re: Replace commas with spaces between quotes, parsing CSV
by Anonymous Monk on Apr 15, 2016 at 19:42 UTC
|
s/"(.*?)"/$1=~ tr#,# #r/ge;
| [reply] [Watch: Dir/Any] [d/l] |
|
|