Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Replace commas with spaces between quotes, parsing CSV

by BigRedEO (Acolyte)
on Apr 15, 2016 at 18:23 UTC ( [id://1160556]=perlquestion: print w/replies, xml ) Need Help??

BigRedEO has asked for the wisdom of the Perl Monks concerning the following question:

I've asked this question elsewhere, this after have tried many things I've found searching Google, but have not had luck yet. I will try here (hopefully I'll be much better at posting my question after having so many problems the first time I tried to post a question here earlier this week. I'm such a total noob) -

I have happened upon a problem with a program I'm writing that parses through a CSV file with a few million records: two fields in each record have comments that users have input, and sometimes they use commas within their comments. If there are commas input, that field will be contained in double quotes in the record. I need to replace any commas found in those fields with a space. Here is one such record from the file to give you an idea. -

1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30

NOTE - I do not have the Text::CSV module available to me, nor will it be made available in the server I am using. Here is part of my code in parsing this file. The first thing I do is concatenate the very first three fields and prepend that concatenated field to each line. Then I want to clear out the commas in @fields7,19 (the thing I'm having trouble figuring out), then format the DATE in three fields and the DATETIME in two fields. The only part I can't figure out is replacing those commas with a space. Is there a tr/,/ /; line or regex line or anything else that would work? -

my @data; # Read the lines one by one. while ( $line = <$FH> ) { # split the fields, concatenate the first three fields, # and add it to the beginning of each line in the file chomp($line); my @fields = split(/,/, $line); unshift @fields, join '_', @fields[0..2]; # remove user input commas in fields 7 and 19 $_ = for fields[7,19]; # format DATE and DATETIME fields for MySQL/sqlbatch60 $_ = join '-', (split /\//)[2,0,1] for @fields[14,20,23]; $_ = Time::Piece->strptime($_,'%m/%d/%Y %H:%M')->strftime('%Y- +%m-%d %H:%M') for @fields[38,39]; # write the parsed record back to the file push @data, \@fields; }

Replies are listed 'Best First'.
Re: Replace commas with spaces between quotes, parsing CSV
by GotToBTru (Prior) on Apr 15, 2016 at 18:27 UTC

    More than likely, I learned this trick at PM.

    $i=0; $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line;
    while(<DATA>) { print; $i = 0; print join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/; print "\n"; } __DATA__ one,two,three one,"two,three,five",four "one,two",three one,two,"three,"
    one,two,three one,two,three one,"two,three,five",four one,two three five,four "one,two",three one two,three one,two,"three," one,two,three
    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      I ran this on the OP's data set and it looks like it gives the correct answer, but also throws a couple of warnings in the process. I haven't quite figured out why? Update, new code, try to eliminate warnings... Original code... New version with OP's input line:
      #!usr/bin/perl use warnings; use strict; $|=1; my $line = '1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U" +,16,IFC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, +PURE",4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D00 +1EF449E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512 +050524F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 +9:30'; my $i=0; my $line2 = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; print "line 1=$line\n"; print "line 2=$line2\n"; my $j=0; # recoded into a slightly different way... my @csv; foreach (split /"/,$line) { s/,/ /g if $j++ % 2; push @csv, $_; } print "\n\n$line\n"; print @csv,"\n"; __END__ first version, 2 warnings.... Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. Odd number of elements in anonymous hash at C:\Projects_Perl\handledou +blequotecsv.pl line 9. line 1=1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,I +FC 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE" +,4/28/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF44 +9E3AB97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D51205052 +4F445F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 line 2=1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC + 8112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/2 +8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A +B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44 +5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 2nd version no warnings.... 1925,47365,2,650187016,1,1,"MADE FOR DRAWDOWNS, NEVER P/U",16,IFC 8112 +NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/28/2 +015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97 +F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F5 +04F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 1925,47365,2,650187016,1,1,MADE FOR DRAWDOWNS NEVER P/U,16,IFC 8112NP +,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,PA-3 PURE,4/28/2015, +1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3AB97F0B9 +8C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F445F504F5 +331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30 Process completed successfully
      Of course @csv chould be joined back together since it was split on '"' instead of ',' and then re-split on comma. This algorithm does appear to work when reformulated and does its job of getting rid of commas within double quotes without run-time warnings. Replacing the "," with a space might not be the "best". possible improvement is to reduce multiple spaces to single spaces.

        I had tried throwing his line in my script here -

        my $filename = 'tested.csv'; open my $FH, $filename or die "Could not read from $filename <$!>, program halting."; # Read the header line. chomp(my $line = <$FH>); my $i = 0; $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; my @fields = split(/,/, $line); print Dumper(@fields), $/; my @data; # Read the lines one by one. while($line = <$FH>) {

        and got this when I tried to run the script -

        Use of uninitialized value in join or string at AlterDataNew.pl line 2 +9, <$FH> line 5.
        Curious - what is the line with the pipe here doing - $|=1;
      So how do I use that within my code here? Something like:
      my $i = 0; $_ = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/ for @fields +[7,19];
      Or is that not correct?

        You'll want to apply this function to $line before you split it into @fields.

        while(...) chomp($line); $line = join '', grep { $i++ % 2 ? {s/,/ /g} : 1} split /"/,$line; @fields = split(/,/,$line);
        But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: Replace commas with spaces between quotes, parsing CSV
by AnomalousMonk (Archbishop) on Apr 16, 2016 at 01:16 UTC

    I don't know if you have a satisfactory solution yet, but this seems to work. Note that the substitution is done on any and all double-quoted strings in a record, not specific ones. If this is not acceptable, then you already seem able to extract specific fields and the substitution can be done on those fields individually. Note also that the  /r modifier of the  tr///r version of the substitution is available only for Perl versions from 5.14 onward; the other  tr/// expression works in any Perl version. (I've also used the  __DATA__ block from poj's code.)
    File repl_commas_in_csv_1.pl:

    Update: I've just noticed that the AnonyMonk posted essentially the same idea here. I think the double-quote regex I'm using is a bit more robust than that one, but it still must be considered fragile compared to what may appear in a fully-fledged CSV record — a caveat that, I admit, I should have included in my original reply!


    Give a man a fish:  <%-{-{-{-<

Re: Replace commas with spaces between quotes, parsing CSV
by poj (Abbot) on Apr 15, 2016 at 18:58 UTC

    You could try Text::ParseWords

    #!perl use strict; use Time::Piece; use Text::ParseWords; use Data::Dumper; my @data = (); while (my $line = <DATA>){ my @fields = quotewords(',', 0, $line); unshift @fields, join '_', @fields[0..2]; s/,//g for @fields[7,19]; $_ = join '-', (split /\//)[2,0,1] for @fields[14,20,23]; $_ = Time::Piece->strptime($_,'%m/%d/%Y %H:%M')->strftime('%Y-%m-%d +%H:%M') for @fields[38,39]; push @data,\@fields; } print Dumper \@data; __DATA__ 1925,47365,2,650187016,1,1,"MADE, FOR, DRAWDOWNS, NEVER, P/U",16,IFC 8 +112NP,Standalone-6,,,44,10/22/2015,91607,,B24W02651,,"PA-3, PURE",4/2 +8/2015,1,0,,1,MAN,,CUST,,CUSTOM MATCH,0,TRUE,TRUE,O,C48A0D001EF449E3A +B97F0B98C811B1B,POS.MISTINT.V0000.UP.Q,PROD_SMISA_BK,414D512050524F44 +5F504F5331393235906F28561D2F0020,10/22/2015 9:29,10/22/2015 9:30
    poj
      Except that has appeared to put commas in every space in the quoted fields? Rather than replace any commas in the quoted fields with spaces?
Re: Replace commas with spaces between quotes, parsing CSV
by bart (Canon) on Apr 16, 2016 at 04:28 UTC
    Here's mine, a different approach, based on the idea "replace what you want to keep by itself":
    s/(,)|(".*?")/ $1 || $2 =~ s(,)( )gr /ge;
    This requires a "fairly recent" perl, though I'm unsure exactly when s///r was introduced.

    For older (and for newer) perls, you can use the slightly more verbose:

    s/(,)|(".*?")/ $1 || do { (my $s = $2) =~ s(,)( )g; $s } /ge;
      "... I'm unsure exactly when s///r was introduced."

      It was added in v5.14.0. See the Non-destructive substitution section of perl5140delta: Regular Expressions.

      — Ken

Re: Replace commas with spaces between quotes, parsing CSV
by jellisii2 (Hermit) on Apr 15, 2016 at 18:58 UTC
    I'd strongly recommend using one of the fine CSV parsing modules to do this for you.
      Unfortunately, I do not seem to have any of the "Text" modules, nor will I have the option to download any of them.
        Text::CSV has a pure Perl version. If you can upload your program to your server, you can also upload the module.
Re: Replace commas with spaces between quotes, parsing CSV
by Anonymous Monk on Apr 15, 2016 at 19:42 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1160556]
Front-paged by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (9)
As of 2024-03-28 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found