in reply to Handling caps for surnames with capitals in the middle (was: Irish Surnames)

Maybe even more exciting are the following Dutch surnames:

de Vries
van de Kamp
van Limburg Stierum
van Holthe tot Echten
de Vos tot Nederveen Cappel
or even
Olde Reuver of Briel
Rutten bij- of meergenaamd Verbeek (!)

I guess you'd have to know the nationality of the persons involved, and then try some heuristics to detect stuff like 'Mac','Mc','van','de' (in dutch, with a space), 'De' (in Irish, without space) etc.

I don't know of any CPAN module that does this, but it sounds like an interesting project :-)

Replies are listed 'Best First'.
Example
by Joost (Canon) on May 06, 2002 at 12:10 UTC
    Anyway, here's a quick hack to show the heuristic:

    #!/usr/bin/perl -w use strict; for (qw(mcginley macgee develera)) { print "$_ => ".handle_caps($_)."\n"; } sub handle_caps { # this assumes irish capitalisation! my $name = ucfirst(shift); for (qw(Mc Mac)) { # always ok $name =~ s/^$_(.*)/$_\u$1/; } for (qw(De)) { # may not be followed by [aoeiu] # ?? don't know enough irish # for this rule ;-) $name =~ s/^$_([^aoeiu].*)/$_\u$1/; } return $name; }