in reply to Help formatting text to delimited text in file

How does your input look like when Mrs. Storer marries again and is now known as Mrs. Storer-Goods? You know, she'sone of these modern women who won't give up her name. Especially not since she had to walk over dead former Mrs. Storers' body to achieve it, but I disgress.

In other Words, can there be "-" in the data and if so, is it escaped?


holli

You can lead your users to water, but alas, you cannot drown them.
  • Comment on Re: Help formatting text to delimited text in file

Replies are listed 'Best First'.
Re^2: Help formatting text to delimited text in file
by AnomalousMonk (Archbishop) on Apr 21, 2019 at 22:31 UTC
Re^2: Help formatting text to delimited text in file
by Marshall (Canon) on Apr 21, 2019 at 21:30 UTC
    This is a completely valid point.
    With names there is of course always the additional complications of titles, XXX,MD or Jr. Sr., III etc. Sometimes people have four or just two names instead of three. Wild cases like a single name "Madonna" also happen. A friend of mine didn't get fully divorced, but did drop her last name (which she got from her husband). A weird thing short of going back to her maiden name.

    In my code at Re: Help formatting text to delimited text in file, my line my ($name1, $name2) = split /\s*-\s*/,$sub_name; was my ($name1, $name2) = split /\s+-\s+/,$sub_name; until I ran it and saw: Ophelia- Mrs. Storer instead of the expected Ophelia - Mrs. Storer.

    I am not sure if the missing space before the "-" is a typo or not? I changed the split regex to allow optional spaces before and after to work with the OP's posted data in a quick decision. Usually a hyphenated name will be printed without spaces either before or after each surname. Mileage varies.

    With just 2 example lines, we can't solve every potential case. There is always some iteration involved when working ad-hoc without a complete spec. It could be that requiring a space after the "-" is enough to differentiate between "Smith-Jones"? Not sure.

    I think my suggestion to count hyphens on each line and identify outliers is a good one. Modify code accordingly.
    I still suspect that: "Ophelia- Mrs. Storer" is a typo.
    Update: Oh, in this type of printout, I sincerely doubt that that there are any "escape" characters like "\" to guide the process. Could be, but doubtful. I am working on a project right now where I have to parse several types of printouts designed for humans. Perl is an excellent language for this! Regex is an operator instead of an object and I can quickly iterate and fine tune the parsing functions.