hesco has asked for the wisdom of the Perl Monks concerning the following question:
So as not to further distract the cb from its discussion of Zappa's life's work, I bring this question here.
I am working on a script to clean and import a large csv data file into a postgres backend. My duplicate checks are failing to prevent fatal errors related to unique data constraints applied to some tables. As I drill into this issue, what I'm finding is that my regex's are not working as I thought they would.
This is yielding debug output which looks like this:#!/usr/bin/perl -w use strict; use warnings; open ('CSV','<',$file); while (<CSV>){ $counter++; @fields = split ",",$_; my($ndx); foreach $ndx (0 .. (0 + @fields -1)){ $fields[$ndx] =~ s/\s*$//g; $fields[$ndx] =~ s/'/\\'/g; $fields[$ndx] =~ s/"//g; if($ndx == 32){ print STDERR "For field $ndx, the value is now: |$fields[$ndx]| +\n"; my @chars = split(//, $fields[$ndx]); foreach my $chr (@chars) { print STDERR '|' . ord($chr) . '|,'; } print STDERR "\n"; } } close('CSV');
Field #32 is only a representative sample. This issue is showing up on several fields actually.For field 32, the value is now: |096 | |48|,|57|,|54|,|32|,|32|,
Can anyone please explain why this is happening for me, please? And more important what I can do about it?
Thanks,
-- Hugh
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Baffled by data cleaning regex issue
by ikegami (Patriarch) on Nov 18, 2008 at 23:49 UTC | |
|
Re: Baffled by data cleaning regex issue
by gone2015 (Deacon) on Nov 19, 2008 at 00:26 UTC | |
|
Re: Baffled by data cleaning regex issue
by ikegami (Patriarch) on Nov 18, 2008 at 23:55 UTC | |
|
Re: Baffled by data cleaning regex issue
by eye (Chaplain) on Nov 19, 2008 at 01:26 UTC |