wind++. Excellent recommendation!

D:\>chcp Active code page: 1252 D:\>type 889023.pl #!perl use strict; use warnings; use Lingua::EN::NameCase; binmode DATA, ':encoding(ISO-8859-1)'; binmode STDOUT, ':encoding(Windows-1252)'; while (my $original_name = <DATA>) { chomp $original_name; my $normalized_name = nc($original_name); printf "%30s %s\n", $original_name, $normalized_name; } __DATA__ MARILYN MCCORD ADAMS D'ALEMBERT, JEAN ÉTIENNE DE LA BOÉTIE ÉMILIE DU CHÂTELET HÉLÈNE CIXOUS DESCARTES, RENÉ durkheim, émile FREUD, SIGMUND GÖDEL, KURT þorsteinn gylfason OLIVER WENDELL HOLMES, JR. JUNG, CARL KANT, IMMANUEL MACHIAVELLI, NICCOLÒ MARX, KARL NIETZSCHE, FRIEDRICH ROUSSEAU, JEAN-JACQUES SARTRE, JEAN-PAUL SCHOPENHAUER, ARTHUR ANNE LOUISE GERMAINE DE STAËL D:\>perl 889023.pl MARILYN MCCORD ADAMS Marilyn McCord Adams D'ALEMBERT, JEAN D'Alembert, Jean ÉTIENNE DE LA BOÉTIE Étienne de la Boétie ÉMILIE DU CHÂTELET Émilie du Châtelet HÉLÈNE CIXOUS Hélène Cixous DESCARTES, RENÉ Descartes, René durkheim, émile Durkheim, Émile FREUD, SIGMUND Freud, Sigmund GÖDEL, KURT Gödel, Kurt þorsteinn gylfason Þorsteinn Gylfason OLIVER WENDELL HOLMES, JR. Oliver Wendell Holmes, Jr. JUNG, CARL Jung, Carl KANT, IMMANUEL Kant, Immanuel MACHIAVELLI, NICCOLÒ Machiavelli, Niccolò MARX, KARL Marx, Karl NIETZSCHE, FRIEDRICH Nietzsche, Friedrich ROUSSEAU, JEAN-JACQUES Rousseau, Jean-Jacques SARTRE, JEAN-PAUL Sartre, Jean-Paul SCHOPENHAUER, ARTHUR Schopenhauer, Arthur ANNE LOUISE GERMAINE DE STAËL Anne Louise Germaine de Staël D:\>

When I remove the two calls to binmode, the script produces the same output. This is due to the fact that Lingua::EN::NameCase calls use locale. So whereas wind wrote, "It's not going to help you with your special character issue," the truth is, at least on a Microsoft Windows computer with the right code page and regional (i.e., locale) settings, the module does take care of the character encoding for you. Obviously, it's better and safer to be explicit about the character encodings in your Perl script.

The module converts MCCORD to McCord, but it cleverly does not convert MACHIAVELLI to MacHiavelli. Perché no? Because Machiavelli ends with an i, so it rightly surmises it's an Italian name. Nice.

My favorite name in the list is Þorsteinn Gylfason, converted from all lowercase letters, þorsteinn gylfason. (See þorn.info.)


In reply to Re^2: Unable to lc upper case accented characters by Jim
in thread Unable to lc upper case accented characters by jkeenan1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.