scotttromley has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I've got a CSV file of contacts from a customer of mine. Whoever migrated his contacts into Gmail ended up putting the First Name, Last Name, Email, and a few other fields into the notes field of the CSV, and leaving the actual fields undef, and I've taken it upon myself to clean this up.
I decided the best way to handle this would be by learning perl and using it, and I've made decent progress thus far. I've cleaned up the csv as best as I can using Sublime Text 2, as I have lots of experience in it and a fair grasp of regex. Here is the situation I'm in:</p
I have written a Perl program that can (using Text::CSV) import each line of the CSV. It then saves the Notes field for the current line to a scalar $notesField. Here is an example of a few lines of the input data:
,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Dobbin Last Name: David L. Addre +ss: david@adamsonanddobbin.com City: PO Box 1326407 Pido Road State: +Peterborough ZIP Code: ON Country: K9J 7H5 First Name: Dobbin Last Name: David L. E-mail: david@adamsonanddobbin. +com Address: PO Box 1326407 Pido Road City: Peterborough State: ON ZI +P Code: K9J 7H5",,,,,,Home,743 7790,Other,742 4524,Work,745 5751,,,,, +,,,,,,Adamson And Dobbin Ltd. Mechanical Contractors,,General Manager +,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Chapleau Last Name: Kathy, Ken A +ddress: 666 FrankFirst Name: Chapleau Last Name: Kathy, Ken City: 666 + Frank ",,,,,,Home,876-9863,,,,,,,,,,,,,,,Admiralty Hall,,Accountant,,,,,,,,, +,,
The data between the double quotes is field[ 25 ] of the line, and is the notes field. You may notice the Last Name: string in the notes field is followed by the first name of the contact, as whoever migrated this data seems to have transposed those two fields
My next step, given that I can read each line in and extract the notes field, is to use regex to match the string following First Name: so I can save it to the appropriate field in that row, the Given Name field of the CSV. The pattern I've come up with is:
if ($notesField =~ /First Name:.*?(?=Last)/) { print "MATCHED!"; }
Which approximately works in Sublime Text 2's Regex search to select what I need it to. However, it doesn't match anything in my Perl program.
My intention is to match each field in the row, concatenate together a new row, and write it out to a new CSV file, preserving the existing Notes field to account for any data that may be lost in the process.
Please help me figure out what I am doing wrong with my regex, as well as any other protips you could offer. I realize this is a pretty messy dataset, but I am really enjoying the challenge and am not easily discouraged. Thank you for your help, if you need any more information please let me know. Here is the full text of my program thus far:
use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new ({ sep_char => ',', binary => 1, auto_diag => 1, allow_loose_quotes => 1 }); my $file = $ARGV[0] or die "Need to load a file from stdin"; open (my $data, '<', $file) or die "Couldn't load file '$file'\n"; while(my $row = $csv->getline($data)) { my @fields = @{$row}; my $notesField = $fields[25]; if (defined $notesField) { if ($notesField =~ /First Name:.*?(?=Last)/) { print "MATCHED!"; } } else { print"No Notes Field \n\n"; } } if (not $csv->eof) { $csv->error_diag(); } close $data;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing a complex csv, cleaning it up, and exporting it
by Athanasius (Archbishop) on Aug 28, 2014 at 02:48 UTC | |
|
Re: Parsing a complex csv, cleaning it up, and exporting it
by RichardK (Parson) on Aug 28, 2014 at 01:14 UTC | |
|
Re: Parsing a complex csv, cleaning it up, and exporting it
by hdb (Monsignor) on Aug 28, 2014 at 06:52 UTC |