in reply to Re: Capture Groups
in thread Capture Groups

I was wondering if this question is related to last year's Split first and last names

No connection at all...
But bonus points for your memory recall and for joining similar dots!

any update on your learnings from your long name parsing journey

I did create an internal discussion document and went on to create a parser for dealing with name strings and splitting them up reasonably well. But, given that they are difficult to split with programmatically with certainty, we parse the string then show the user how we have split them allowing them to adjust as their superior human brain sees fit. Except where the names are a known firstname (looked up from a long list) and a single surname. The we don't show the parse results but the user can adjust of they think it's appropriate.

It is working well for the low volumes of traffic we currently have.

Our roadmap includes adding AI to this parsing process. It is something that AI should be as good as a human at doing. At least, nearly as good as a human. So far I have written a couple of prompts and fed the AI a variety of tricky names to split up into their component parts and the results look promising. The AI is formatting them nicely as JSON so we should be able to deal with the results.

It's parsed Johannes Adam Ferdinand Alois Josef Maria Marko d'Aviano Pius von und zu Liechtenstien and told me that I don't have enough fields to properly accommodate all the constituent parts. But I doubt the ruler of Liechtenstien will need us to store his name!

  • Comment on Splitting names revisited (was: Re^2: Capture Groups)

Replies are listed 'Best First'.
Re: Splitting names revisited (was: Re^2: Capture Groups)
by NetWallah (Canon) on Nov 15, 2023 at 00:07 UTC
    Can it handle Picasso's full name ? (̿▀̿‿ ̿▀̿ ̿)
    Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso

                    "If it happens once, it's a bug. If it happens twice, it's a feature. If it happens more than twice, it's a design philosophy."

      Ah, multi-word given names (some of them the full name of a saint) and the spanish custom of using both parent's family name.

      (I didn't find a web reference to "Cipriano de la Santísima Trinidad", but assume it to refer to some "Cyprian of the Holy(est) Trinity")