So as not to further distract the cb from its discussion of Zappa's life's work, I bring this question here.
I am working on a script to clean and import a large csv data file into a postgres backend. My duplicate checks are failing to prevent fatal errors related to unique data constraints applied to some tables. As I drill into this issue, what I'm finding is that my regex's are not working as I thought they would.
This is yielding debug output which looks like this:#!/usr/bin/perl -w use strict; use warnings; open ('CSV','<',$file); while (<CSV>){ $counter++; @fields = split ",",$_; my($ndx); foreach $ndx (0 .. (0 + @fields -1)){ $fields[$ndx] =~ s/\s*$//g; $fields[$ndx] =~ s/'/\\'/g; $fields[$ndx] =~ s/"//g; if($ndx == 32){ print STDERR "For field $ndx, the value is now: |$fields[$ndx]| +\n"; my @chars = split(//, $fields[$ndx]); foreach my $chr (@chars) { print STDERR '|' . ord($chr) . '|,'; } print STDERR "\n"; } } close('CSV');
Field #32 is only a representative sample. This issue is showing up on several fields actually.For field 32, the value is now: |096 | |48|,|57|,|54|,|32|,|32|,
Can anyone please explain why this is happening for me, please? And more important what I can do about it?
Thanks,
-- Hugh
In reply to Baffled by data cleaning regex issue by hesco
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |