Search/Replace within fields of CSV

johnmck has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Search/Replace within fields of CSV by Tux (Canon) on May 08, 2015 at 18:55 UTC
I'd say add a header line and use DBD::CSV. Of course you can do it with Text::CSV or Text::CSV_XS. I'd use DBD::CSV if the dataset isn't too large (the whole set fits easily in memory). I'd use Text::CSV_XS' csv function with on_in that lets you change fields on the fly. $ cat test.pl #!/pro/bin/perl use 5.20.0; use warnings; use Text::CSV_XS "csv"; my %state = qw( 0H Ohio 0R Oregon C CONDENSED F Final ); csv (in => "test.csv", on_in => sub { $_[1][0] = $state{$_[1][0]} \|\| $_[1][0]; $_[1][4] = $state{$_[1][4]} \|\| $_[1][4]; }); $ cat test.csv C,003187995,0H047013N,20141212,0H,2050,EX,, C,013139784,0H0430000,20141110,0H,2014,EX,, C,153188023,0H0020000,20150105,0H,2015,BA,, C,173167740,MT007015G,20141202,0R,2015,BA,, $ perl test.pl CONDENSED,003187995,0H047013N,20141212,Ohio,2050,EX,, CONDENSED,013139784,0H0430000,20141110,Ohio,2014,EX,, CONDENSED,153188023,0H0020000,20150105,Ohio,2015,BA,, CONDENSED,173167740,MT007015G,20141202,Oregon,2015,BA,, $ [download] update: As of version 1.17, you can also use `filter` (if you prefer that): `csv (in => "test.csv", filter => { 1 => sub { $_ = $state{$_} \|\| $_ }, 5 => sub { $_ = $state{$_} \|\| $_ }, });` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: Search/Replace within fields of CSV by aaron_baugher (Curate) on May 08, 2015 at 18:36 UTC
The safest way to deal with CSV is with a module, and they're easy to use, so there's no reason not to. Let the module cleanly divide up the fields, make the changes you need, and write the results back out to a new file. #!/usr/bin/env perl use 5.010; use strict; use warnings; use Text::CSV; my $ic = Text::CSV->new({sep_char => ','}) + or die Text::CSV->error_diag(); my $oc = Text::CSV->new({sep_char => ',', eol => $/ }) or die Text::CSV->error_diag(); open my $if, '<', 'infile' or die $!; open my $of, '>', 'outfile' or die $!; my %states = ( '0H' => 'OHIO', '0R' => 'OREGON', ); while( my $r = $ic->getline($if)){ $r->[0] = 'CONDENSED' if $r->[0] eq 'C'; $r->[0] = 'FINAL' if $r->[0] eq 'F'; $r->[4] = $states{$r->[4]} if $states{$r->[4]}; $oc->print($of, $r); } close $if; close $of; [download] Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l]
Re^2: Search/Replace within fields of CSV by johnmck (Initiate) on May 08, 2015 at 18:41 UTC
Thanks very much, Aaron. I was thinking about Text::CSV but I wasn't sure if there was a more practical way to do it. Your code really helps. Thank you.	[reply]
Re: Search/Replace within fields of CSV by CountZero (Bishop) on May 08, 2015 at 18:41 UTC
If you can edit the CSV file and add a first line with field names (I used `"First,Second,Third,Fourth,Fifth,Sixth,Seventh,Eight"`), then you can use DBI and SQL to do this job for you. `use Modern::Perl; use DBI; my $dbh = DBI->connect( "dbi:CSV:", undef, undef, { f_ext => '.csv', f_dir => 'D:/Perl/scripts', RaiseError => 1, } ) or die "Cannot connect: $DBI::errstr"; $dbh->do(q/UPDATE test SET First = 'CONDENSED' WHERE test.First = 'C'/ +); $dbh->do(q/UPDATE test SET First = 'FINAL' WHERE test.First = 'F'/); $dbh->do(q/UPDATE test SET Fifth = 'Ohio' WHERE test.Fifth = '0H'/); $dbh->do(q/UPDATE test SET Fifth = 'Oregon' WHERE test.Fifth = '0R'/);` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Search/Replace within fields of CSV by vinoth.ree (Monsignor) on May 08, 2015 at 18:32 UTC
This is a CSV file. In each row there are fields separated with comma. So here is the idea. The algorithm The process should go like this: 1.Read in the file line by line. 2.For each line, extract the 1rd and 5th column and check whatever you want and print. Update: `use strict; use warnings; my $file = 'csv_file.csv'; open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n"; while (my $line = <$fh>) { chomp $line; my @fields = split(/,/, $line); if ($fields[0] eq 'C') { $fields[0] = 'CONDENSED'; } elsif ($fields[0] eq 'F'){ $fields[0]='FINAL' } if ($fields[4] eq '0H') { $fields[4] = '0HIO' } elsif ($fields[4] eq '0R') { $fields[4] = 'OREGON' } print join(',',@fields); print "\n" }` [download] Sorry delayed to update the code with module Text::CSV, #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',' }); my $file = $ARGV[0] or die "Need to get CSV file on the command line\n +"; open(my $data, '<', $file) or die "Could not open '$file' $!\n"; while (my $line = <$data>) { chomp $line; if ($csv->parse($line)) { my @fields = $csv->fields(); if ($fields[0] eq 'C') { $fields[0] = 'CONDENSED'; } elsif ($fields[0] eq 'F'){ $fields[0]='FINAL' } if ($fields[4] eq '0H') { $fields[4] = '0HIO' } elsif ($fields[4] eq '0R') { $fields[4] = 'OREGON' } print join(',',@fields); print "\n" } else { warn "Line could not be parsed: $line\n"; } } [download] *All is well. I learn by answering your questions...*	[reply] [d/l] [select]
Re: Search/Replace within fields of CSV by edimusrex (Monk) on May 08, 2015 at 18:44 UTC
You could use a series of regex matches and use a new file to write the output. Something along the lines of `#!/usr/bin/perl use warnings; use strict; my $csv = 'test.csv'; my $complete = 'output.csv'; open FILE, "<$csv"; chomp(my @file = <FILE>); foreach(@file) { my @cleaned; my @array = split(",",$_); foreach(@array) { $_ =~ s/^C$/CONDENSED/g; $_ =~ s/^0H$/OHIO/g; #etc..for what you want to match push @cleaned, $_; } open OUTPUT, ">>$complete"; print OUTPUT join(",",@cleaned)."\n"; close OUTPUT; }` [download] If you don't want to use any additional modules	[reply] [d/l]

The algorithm