in reply to Parsing a complex csv, cleaning it up, and exporting it
My preference would be to first decompose the notes field into its parts (all of them) and to store it into some structure (like a hash) and only then extract the information you are looking for.
Here, a split would do exactly that using captures in the regex to keep the field names.
use strict; use warnings; use Data::Dumper; while(<DATA>){ my( $notesField )= /"(.*)"/; my @parts = split /\s?(First Name|Last Name|Address|City|State|ZIP + Code|E-mail): /, $notesField; shift @parts; my %parts = @parts; print Dumper \%parts; } __DATA__ ,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Dobbin Last Name: David L. Addre +ss: david@adamsonanddobbin.com City: PO Box 1326407 Pido Road State: +Peterborough ZIP Code: ON Country: K9J 7H5 First Name: Dobbin Last Na +me: David L. E-mail: david@adamsonanddobbin.com Address: PO Box 13264 +07 Pido Road City: Peterborough State: ON ZIP Code: K9J 7H5",,,,,,Hom +e,743 7790,Other,742 4524,Work,745 5751,,,,,,,,,,,Adamson And Dobbin +Ltd. Mechanical Contractors,,General Manager,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Chapleau Last Name: Kathy, Ken A +ddress: 666 FrankFirst Name: Chapleau Last Name: Kathy, Ken City: 666 + Frank",,,,,,Home,876-9863,,,,,,,,,,,,,,,Admiralty Hall,,Accountant,, +,,,,,,,,,
The result would be a hash like this:
$VAR1 = { 'First Name' => 'Dobbin', 'ZIP Code' => 'K9J 7H5', 'Address' => 'PO Box 1326407 Pido Road', 'Last Name' => 'David L.', 'City' => 'Peterborough', 'E-mail' => 'david@adamsonanddobbin.com', 'State' => 'ON' }; $VAR1 = { 'First Name' => 'Chapleau', 'Address' => '666 Frank', 'Last Name' => 'Kathy, Ken', 'City' => '666 Frank' };
Please note that my extraction of the notes field is only done using a regex for convenience. Your approach using Text::CSV is clearly the right way to do it.
When using a hash you would lose any duplicates. So if there are two first names, only one of them would survive.
|
|---|