johnmck has asked for the wisdom of the Perl Monks concerning the following question:

I would be greatly appreciative of any suggestions on how to search/replace within specific fields of a CSV. My dataset looks like this:

C,003187995,0H047013N,20141212,0H,2050,EX,,
C,013139784,0H0430000,20141110,0H,2014,EX,,
C,153188023,0H0020000,20150105,0H,2015,BA,,
C,173167740,MT007015G,20141202,0R,2015,BA,,

As an example, I'd like to replace the first field - where there is a 'C' replaced it with CONDENSED or an 'F' replace with FINAL. In the fifth field instead of 0H I'd like to replace that with OHIO, or 0R with OREGON.

Can anyone suggest an approach to start with? Thanks.

John

Replies are listed 'Best First'.
Re: Search/Replace within fields of CSV
by Tux (Canon) on May 08, 2015 at 18:55 UTC

    I'd say add a header line and use DBD::CSV. Of course you can do it with Text::CSV or Text::CSV_XS.

    I'd use DBD::CSV if the dataset isn't too large (the whole set fits easily in memory).

    I'd use Text::CSV_XS' csv function with on_in that lets you change fields on the fly.

    $ cat test.pl #!/pro/bin/perl use 5.20.0; use warnings; use Text::CSV_XS "csv"; my %state = qw( 0H Ohio 0R Oregon C CONDENSED F Final ); csv (in => "test.csv", on_in => sub { $_[1][0] = $state{$_[1][0]} || $_[1][0]; $_[1][4] = $state{$_[1][4]} || $_[1][4]; }); $ cat test.csv C,003187995,0H047013N,20141212,0H,2050,EX,, C,013139784,0H0430000,20141110,0H,2014,EX,, C,153188023,0H0020000,20150105,0H,2015,BA,, C,173167740,MT007015G,20141202,0R,2015,BA,, $ perl test.pl CONDENSED,003187995,0H047013N,20141212,Ohio,2050,EX,, CONDENSED,013139784,0H0430000,20141110,Ohio,2014,EX,, CONDENSED,153188023,0H0020000,20150105,Ohio,2015,BA,, CONDENSED,173167740,MT007015G,20141202,Oregon,2015,BA,, $

    update: As of version 1.17, you can also use filter (if you prefer that):

    csv (in => "test.csv", filter => { 1 => sub { $_ = $state{$_} || $_ }, 5 => sub { $_ = $state{$_} || $_ }, });

    Enjoy, Have FUN! H.Merijn
Re: Search/Replace within fields of CSV
by aaron_baugher (Curate) on May 08, 2015 at 18:36 UTC

    The safest way to deal with CSV is with a module, and they're easy to use, so there's no reason not to. Let the module cleanly divide up the fields, make the changes you need, and write the results back out to a new file.

    #!/usr/bin/env perl use 5.010; use strict; use warnings; use Text::CSV; my $ic = Text::CSV->new({sep_char => ','}) + or die Text::CSV->error_diag(); my $oc = Text::CSV->new({sep_char => ',', eol => $/ }) or die Text::CSV->error_diag(); open my $if, '<', 'infile' or die $!; open my $of, '>', 'outfile' or die $!; my %states = ( '0H' => 'OHIO', '0R' => 'OREGON', ); while( my $r = $ic->getline($if)){ $r->[0] = 'CONDENSED' if $r->[0] eq 'C'; $r->[0] = 'FINAL' if $r->[0] eq 'F'; $r->[4] = $states{$r->[4]} if $states{$r->[4]}; $oc->print($of, $r); } close $if; close $of;

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

      Thanks very much, Aaron. I was thinking about Text::CSV but I wasn't sure if there was a more practical way to do it. Your code really helps. Thank you.
Re: Search/Replace within fields of CSV
by CountZero (Bishop) on May 08, 2015 at 18:41 UTC
    If you can edit the CSV file and add a first line with field names (I used "First,Second,Third,Fourth,Fifth,Sixth,Seventh,Eight"), then you can use DBI and SQL to do this job for you.
    use Modern::Perl; use DBI; my $dbh = DBI->connect( "dbi:CSV:", undef, undef, { f_ext => '.csv', f_dir => 'D:/Perl/scripts', RaiseError => 1, } ) or die "Cannot connect: $DBI::errstr"; $dbh->do(q/UPDATE test SET First = 'CONDENSED' WHERE test.First = 'C'/ +); $dbh->do(q/UPDATE test SET First = 'FINAL' WHERE test.First = 'F'/); $dbh->do(q/UPDATE test SET Fifth = 'Ohio' WHERE test.Fifth = '0H'/); $dbh->do(q/UPDATE test SET Fifth = 'Oregon' WHERE test.Fifth = '0R'/);

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Search/Replace within fields of CSV
by vinoth.ree (Monsignor) on May 08, 2015 at 18:32 UTC

    This is a CSV file. In each row there are fields separated with comma. So here is the idea.

    The algorithm

    The process should go like this:

        1.Read in the file line by line.
        2.For each line, extract the 1rd and 5th column and check whatever you want and print. 
    
    Update:
    use strict; use warnings; my $file = 'csv_file.csv'; open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n"; while (my $line = <$fh>) { chomp $line; my @fields = split(/,/, $line); if ($fields[0] eq 'C') { $fields[0] = 'CONDENSED'; } elsif ($fields[0] eq 'F'){ $fields[0]='FINAL' } if ($fields[4] eq '0H') { $fields[4] = '0HIO' } elsif ($fields[4] eq '0R') { $fields[4] = 'OREGON' } print join(',',@fields); print "\n" }

    Sorry delayed to update the code with module Text::CSV,

    #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',' }); my $file = $ARGV[0] or die "Need to get CSV file on the command line\n +"; open(my $data, '<', $file) or die "Could not open '$file' $!\n"; while (my $line = <$data>) { chomp $line; if ($csv->parse($line)) { my @fields = $csv->fields(); if ($fields[0] eq 'C') { $fields[0] = 'CONDENSED'; } elsif ($fields[0] eq 'F'){ $fields[0]='FINAL' } if ($fields[4] eq '0H') { $fields[4] = '0HIO' } elsif ($fields[4] eq '0R') { $fields[4] = 'OREGON' } print join(',',@fields); print "\n" } else { warn "Line could not be parsed: $line\n"; } }

    All is well. I learn by answering your questions...
Re: Search/Replace within fields of CSV
by edimusrex (Monk) on May 08, 2015 at 18:44 UTC
    You could use a series of regex matches and use a new file to write the output. Something along the lines of

    #!/usr/bin/perl use warnings; use strict; my $csv = 'test.csv'; my $complete = 'output.csv'; open FILE, "<$csv"; chomp(my @file = <FILE>); foreach(@file) { my @cleaned; my @array = split(",",$_); foreach(@array) { $_ =~ s/^C$/CONDENSED/g; $_ =~ s/^0H$/OHIO/g; #etc..for what you want to match push @cleaned, $_; } open OUTPUT, ">>$complete"; print OUTPUT join(",",@cleaned)."\n"; close OUTPUT; }

    If you don't want to use any additional modules