comment on

Hello,

I've got a CSV file of contacts from a customer of mine. Whoever migrated his contacts into Gmail ended up putting the First Name, Last Name, Email, and a few other fields into the notes field of the CSV, and leaving the actual fields undef, and I've taken it upon myself to clean this up.

I decided the best way to handle this would be by learning perl and using it, and I've made decent progress thus far. I've cleaned up the csv as best as I can using Sublime Text 2, as I have lots of experience in it and a fair grasp of regex. Here is the situation I'm in:</p

I have written a Perl program that can (using Text::CSV) import each line of the CSV. It then saves the Notes field for the current line to a scalar $notesField. Here is an example of a few lines of the input data:

,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Dobbin Last Name: David L. Addre
+ss: david@adamsonanddobbin.com City: PO Box 1326407 Pido Road State: 
+Peterborough ZIP Code: ON Country: K9J 7H5
First Name: Dobbin Last Name: David L. E-mail: david@adamsonanddobbin.
+com Address: PO Box 1326407 Pido Road City: Peterborough State: ON ZI
+P Code: K9J 7H5",,,,,,Home,743 7790,Other,742 4524,Work,745 5751,,,,,
+,,,,,,Adamson And Dobbin Ltd. Mechanical Contractors,,General Manager
+,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Chapleau Last Name: Kathy, Ken A
+ddress: 666 FrankFirst Name: Chapleau Last Name: Kathy, Ken City: 666
+ Frank
",,,,,,Home,876-9863,,,,,,,,,,,,,,,Admiralty Hall,,Accountant,,,,,,,,,
+,,
[download]

The data between the double quotes is field[ 25 ] of the line, and is the notes field. You may notice the Last Name: string in the notes field is followed by the first name of the contact, as whoever migrated this data seems to have transposed those two fields

My next step, given that I can read each line in and extract the notes field, is to use regex to match the string following First Name: so I can save it to the appropriate field in that row, the Given Name field of the CSV. The pattern I've come up with is:

if ($notesField =~ /First Name:.*?(?=Last)/)
{
    print "MATCHED!";
}
[download]

Which approximately works in Sublime Text 2's Regex search to select what I need it to. However, it doesn't match anything in my Perl program.

My intention is to match each field in the row, concatenate together a new row, and write it out to a new CSV file, preserving the existing Notes field to account for any data that may be lost in the process.

Please help me figure out what I am doing wrong with my regex, as well as any other protips you could offer. I realize this is a pretty messy dataset, but I am really enjoying the challenge and am not easily discouraged. Thank you for your help, if you need any more information please let me know. Here is the full text of my program thus far:

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new ({
    sep_char => ',',
    binary => 1,
    auto_diag => 1,
    allow_loose_quotes => 1
    });
my $file = $ARGV[0] or die "Need to load a file from stdin";

open (my $data, '<', $file) or die "Couldn't load file '$file'\n";
while(my $row = $csv->getline($data)) {
    my @fields = @{$row};
    my $notesField = $fields[25];
    if (defined $notesField) 
    {
        if ($notesField =~ /First Name:.*?(?=Last)/)
        {
            print "MATCHED!";
        }
        
    }
    else
    {
        print"No Notes Field \n\n";
    }
}
    
if (not $csv->eof) {
    $csv->error_diag();
}
close $data;
[download]

In reply to Parsing a complex csv, cleaning it up, and exporting it by scotttromley

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.