JobC has asked for the wisdom of the Perl Monks concerning the following question:
I am new at using perl. I have a CSV that I am importing with Text::CSV_XS. Currently, I am able to clean up much of the poorly formed data generated by the venerable database output tool (thanks perl people). I would like to extract a piece of data from one field that has multiple fields in it. The original field is a long description that usually contains a #F123456, #123456, #123-F123456, #123-123456, or #12AB-123456 in it. This data floats around from left to right and there should be whitespace before the #. Also, the end of the data is either whitespace, or the end of the field. I could extract the data before or after I clean up the field. Here is a couple of examples of the original and modified data.
#Before: ,"TRAY,HINGED,PLSTC,20 CAV #F32473", #After: ,"TRAY HINGED PLSTC 20 CAV #F32473", #Before: ,"BOX HSC,35-3/4X17-1/4 X 50-1/2 SIMULATOR TALL BOX", #After: ,"BOX HSC 35-3/4X17-1/4 X 50-1/2 SIMULATOR TALL BOX", #Before: ,"PAD, FOAM, 24 X 24 X 1/4 #16193 + 112 SHEETS PER ROLL, ORDER IN FULL ROLLS", #After: ,"PAD FOAM 24 X 24 X 1/4 #16193 112 SHEETS PER ROLL ORDER IN FULL ROLL +S", #Before: ,"PKG LIST,ASST ARM,RAD,300 #F37784", #After: ,"PKG LIST ASST ARM RAD 300 #F37784", #Before: ,"PAD, TOP CAP RE17-30048 #F30121 + CORRUGATED ASSEMBLY, 22-7/8 X 21-1/8 X 4-3/4", #After: ,"PAD TOP CAP RE17-30048 #F30121 CORRUGATED ASSEMBLY 22-7/8 X 21-1/8 X + 4-3/4","11.72000","281.28",
Here is some of the code I am using to process the file. I don't know if I can share all of it without quite a bit of obfuscation. This field would be "$row->5"
while (my $row = $aCSV->getline ($ORIG_CSV)) { #Test row and determine if there is data. If the first element is +empty then that is the last line. #This should prevent errors, and drop the extraneous data at the e +nd of the file. if ($row->[0] =~ /\S/ ) { # Remove extra newline from field 5, 6 and 11 unless preceded + by a carriage return, fields start counting at Zero. # Field 6 and 11 replaces with space for readabililty of the +data. $row->[4] =~ s/ (?<!\x0D) \x0A //gx; $row->[5] =~ s/ (?<!\x0D) \x0A / /gx; $row->[10] =~ s/ (?<!\x0D) \x0A / /gx; # cleanup or truncate fields. We only want data that is usabl +e. # these are replacements so if the string within \Q \E doens' +t exist it isn't replaced # They are specific for each field. REMEMBER Fields start cou +nting at ZERO $row->[5] =~ s/ \Q,\E / /gx; # replace comma in field with s +pace. This makes CSV easier to import. $row->[9] =~ s/ \QRevision : \E //gx; $row->[10] =~ s/ \Q,\E / /gx; # replace comma in field with s +pace. This makes CSV easier to import. $row->[12] =~ s/ \QProduction Item: \E//gx; $row->[13] =~ s/ \x23 //gx; # hexidecimal 0x23 is "#" and +this messes with the regex in one step, but two works. $row->[13] =~ s/ \QWO : \E //gx; # remove the remainder junk + "WO : " $row->[14] =~ s/ \QCutlist color: \E //gx; # Some fields have extra whitespace. Lets clean that up becau +se we want to print some of this data. $row->[5] =~ s/\h+/ /g; $row->[10] =~ s/\h+/ /g; # Something to test on $row->[5] /#[^ ]+/ # Testing print some fields. REMEMBER 1 aka Zero print "$row->[5]\t$row->[9]\t$row->[12]\t$row->[13]\n"; push @rows, $row; } else { last; } }
Since I am new at this please tell me if I should do something different for clarity. Also, if there is a more efficient way to do this then clarity probably wins out, for later troubleshooting. Thanks!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extract data from CSV field.
by ww (Archbishop) on Dec 08, 2015 at 22:32 UTC | |
|
Re: Extract data from CSV field.
by poj (Abbot) on Dec 08, 2015 at 22:04 UTC | |
by JobC (Acolyte) on Dec 08, 2015 at 23:49 UTC | |
by JobC (Acolyte) on Dec 09, 2015 at 17:31 UTC | |
by MidLifeXis (Monsignor) on Dec 09, 2015 at 18:15 UTC | |
by poj (Abbot) on Dec 09, 2015 at 17:51 UTC | |
by JobC (Acolyte) on Dec 09, 2015 at 18:15 UTC | |
| |
|
Re: Extract data from CSV field.
by stevieb (Canon) on Dec 08, 2015 at 21:21 UTC | |
|
Re: Extract data from CSV field.
by kcott (Archbishop) on Dec 09, 2015 at 07:27 UTC | |
by JobC (Acolyte) on Dec 09, 2015 at 17:36 UTC | |
|
Re: Extract data from CSV field.
by JobC (Acolyte) on Dec 08, 2015 at 20:46 UTC |