D:\>chcp
Active code page: 1252
D:\>type 889023.pl
#!perl
use strict;
use warnings;
use Lingua::EN::NameCase;
binmode DATA, ':encoding(ISO-8859-1)';
binmode STDOUT, ':encoding(Windows-1252)';
while (my $original_name = <DATA>) {
chomp $original_name;
my $normalized_name = nc($original_name);
printf "%30s %s\n", $original_name, $normalized_name;
}
__DATA__
MARILYN MCCORD ADAMS
D'ALEMBERT, JEAN
ÉTIENNE DE LA BOÉTIE
ÉMILIE DU CHÂTELET
HÉLÈNE CIXOUS
DESCARTES, RENÉ
durkheim, émile
FREUD, SIGMUND
GÖDEL, KURT
þorsteinn gylfason
OLIVER WENDELL HOLMES, JR.
JUNG, CARL
KANT, IMMANUEL
MACHIAVELLI, NICCOLÒ
MARX, KARL
NIETZSCHE, FRIEDRICH
ROUSSEAU, JEAN-JACQUES
SARTRE, JEAN-PAUL
SCHOPENHAUER, ARTHUR
ANNE LOUISE GERMAINE DE STAËL
D:\>perl 889023.pl
MARILYN MCCORD ADAMS Marilyn McCord Adams
D'ALEMBERT, JEAN D'Alembert, Jean
ÉTIENNE DE LA BOÉTIE Étienne de la Boétie
ÉMILIE DU CHÂTELET Émilie du Châtelet
HÉLÈNE CIXOUS Hélène Cixous
DESCARTES, RENÉ Descartes, René
durkheim, émile Durkheim, Émile
FREUD, SIGMUND Freud, Sigmund
GÖDEL, KURT Gödel, Kurt
þorsteinn gylfason Þorsteinn Gylfason
OLIVER WENDELL HOLMES, JR. Oliver Wendell Holmes, Jr.
JUNG, CARL Jung, Carl
KANT, IMMANUEL Kant, Immanuel
MACHIAVELLI, NICCOLÒ Machiavelli, Niccolò
MARX, KARL Marx, Karl
NIETZSCHE, FRIEDRICH Nietzsche, Friedrich
ROUSSEAU, JEAN-JACQUES Rousseau, Jean-Jacques
SARTRE, JEAN-PAUL Sartre, Jean-Paul
SCHOPENHAUER, ARTHUR Schopenhauer, Arthur
ANNE LOUISE GERMAINE DE STAËL Anne Louise Germaine de Staël
D:\>
When I remove the two calls to binmode, the script produces the same output. This is due to the fact that Lingua::EN::NameCase calls use locale. So whereas wind wrote, "It's not going to help you with your special character issue," the truth is, at least on a Microsoft Windows computer with the right code page and regional (i.e., locale) settings, the module does take care of the character encoding for you. Obviously, it's better and safer to be explicit about the character encodings in your Perl script.
The module converts MCCORD to McCord, but it cleverly does not convert MACHIAVELLI to MacHiavelli. Perché no? Because Machiavelli ends with an i, so it rightly surmises it's an Italian name. Nice.
My favorite name in the list is Þorsteinn Gylfason, converted from all lowercase letters, þorsteinn gylfason. (See þorn.info.)
|