My preference would be to first decompose the notes field into its parts (all of them) and to store it into some structure (like a hash) and only then extract the information you are looking for.

Here, a split would do exactly that using captures in the regex to keep the field names.

use strict; use warnings; use Data::Dumper; while(<DATA>){ my( $notesField )= /"(.*)"/; my @parts = split /\s?(First Name|Last Name|Address|City|State|ZIP + Code|E-mail): /, $notesField; shift @parts; my %parts = @parts; print Dumper \%parts; } __DATA__ ,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Dobbin Last Name: David L. Addre +ss: david@adamsonanddobbin.com City: PO Box 1326407 Pido Road State: +Peterborough ZIP Code: ON Country: K9J 7H5 First Name: Dobbin Last Na +me: David L. E-mail: david@adamsonanddobbin.com Address: PO Box 13264 +07 Pido Road City: Peterborough State: ON ZIP Code: K9J 7H5",,,,,,Hom +e,743 7790,Other,742 4524,Work,745 5751,,,,,,,,,,,Adamson And Dobbin +Ltd. Mechanical Contractors,,General Manager,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,"First Name: Chapleau Last Name: Kathy, Ken A +ddress: 666 FrankFirst Name: Chapleau Last Name: Kathy, Ken City: 666 + Frank",,,,,,Home,876-9863,,,,,,,,,,,,,,,Admiralty Hall,,Accountant,, +,,,,,,,,,

The result would be a hash like this:

$VAR1 = { 'First Name' => 'Dobbin', 'ZIP Code' => 'K9J 7H5', 'Address' => 'PO Box 1326407 Pido Road', 'Last Name' => 'David L.', 'City' => 'Peterborough', 'E-mail' => 'david@adamsonanddobbin.com', 'State' => 'ON' }; $VAR1 = { 'First Name' => 'Chapleau', 'Address' => '666 Frank', 'Last Name' => 'Kathy, Ken', 'City' => '666 Frank' };

Please note that my extraction of the notes field is only done using a regex for convenience. Your approach using Text::CSV is clearly the right way to do it.

When using a hash you would lose any duplicates. So if there are two first names, only one of them would survive.


In reply to Re: Parsing a complex csv, cleaning it up, and exporting it by hdb
in thread Parsing a complex csv, cleaning it up, and exporting it by scotttromley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.