in reply to regex: seperating parts of non-formatted names
Welcome to the administrative sub-basement of hell. Depending on the names your friend has to deal with, you might discover that a regex can handle 98%, but that the remaining 2% will cause you to run screaming into the night.
Consider my dear friend Lt. Col. J. Random von Perl-Hacker III By the scheme your friend is using, Randy's name needs to reduce to <Lt. Col.> <J.> <R.> <von Perl-Hacker> (And it isn't immediately clear what to do with the "III".) In any large set of unstructured names, you're going to run into a few like this. Good luck doing handling them with a single regexp.
I think you'll have better luck breaking the name into tokens, providing predicate functions that answer whether a token can be of a particular type, then providing a set of "rules" to match a set of tokens against. This will be slower, but potentially much more accurate, than a regex.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: regex: seperating parts of non-formatted names
by jkahn (Friar) on Sep 09, 2002 at 18:48 UTC | |
|
Re: Re: regex: seperating parts of non-formatted names
by sauoq (Abbot) on Sep 09, 2002 at 23:32 UTC |