This is a completely valid point.
With names there is of course always the additional complications of titles, XXX,MD or Jr. Sr., III etc. Sometimes people have four or just two names instead of three. Wild cases like a single name "Madonna" also happen. A friend of mine didn't get fully divorced, but did drop her last name (which she got from her husband). A weird thing short of going back to her maiden name.

In my code at Re: Help formatting text to delimited text in file, my line my ($name1, $name2) = split /\s*-\s*/,$sub_name; was my ($name1, $name2) = split /\s+-\s+/,$sub_name; until I ran it and saw: Ophelia- Mrs. Storer instead of the expected Ophelia - Mrs. Storer.

I am not sure if the missing space before the "-" is a typo or not? I changed the split regex to allow optional spaces before and after to work with the OP's posted data in a quick decision. Usually a hyphenated name will be printed without spaces either before or after each surname. Mileage varies.

With just 2 example lines, we can't solve every potential case. There is always some iteration involved when working ad-hoc without a complete spec. It could be that requiring a space after the "-" is enough to differentiate between "Smith-Jones"? Not sure.

I think my suggestion to count hyphens on each line and identify outliers is a good one. Modify code accordingly.
I still suspect that: "Ophelia- Mrs. Storer" is a typo.
Update: Oh, in this type of printout, I sincerely doubt that that there are any "escape" characters like "\" to guide the process. Could be, but doubtful. I am working on a project right now where I have to parse several types of printouts designed for humans. Perl is an excellent language for this! Regex is an operator instead of an object and I can quickly iterate and fine tune the parsing functions.


In reply to Re^2: Help formatting text to delimited text in file by Marshall
in thread Help formatting text to delimited text in file by jcg3525

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.