I have names all composed of ascii characters, that I need to uniformize:

I came with the code below that seems to work (as far as I can test it).

My questions: how could I improved it ? (I think it will break if with unicode characters, what changes should I made to get it work with any character set ?)

Thanks

François

use strict; use warnings; while ( my $t = <DATA> ) { chomp $t; printf "orig: %-30s translated: %s\n", $t, translate($t); } sub translate { my $str = shift; $str =~ tr/-/ /; #replace - with a space $str =~ tr/a-zA-Z/ /cs; #replace non letter with a space my @words = split( /\s+/, $str ); foreach my $w (@words) { #insert a space when a upper case is inside a word if ( $w =~ /\p{isLower}\p{isUpper}/ ) { my @all; while ( $w =~ m/\G(\p{isUpper}*\p{isLower}+)/g ) { push @all, $1; } $w = join( " ", @all ); } else { $w = ucfirst( lc($w) ); # we are using side effect of fore +ach loop } } return join( ' ', @words ); } __DATA__ Acierno James S., Jr. Acierno James, Jr. Ackermann-Hirschi L. Agatonovic-Jovini T. Alba-Castro Jose-Luis Alconada Verzini M. J. AlconadaVerzini M. J. Alvarez Fernandez A. Alvarez-Bolado Gonzalo Alvarez-Gonzalez B. AlvarezGonzalez B. AlvarezPiqueras D Amor Dos Santos S. P. Amor DosSantos S. P. AmorDosSantos S. P da Costa F. Barreiro Guimaraes Dano Hoffmann M. DanoHoffmann M. Dell' Acqua A. Dell' Asta L. Dell'Acqua A. Dell'Asta L. Dell'Omo Giacomo della Volp D. della Volpe D. Della Volpe D. DeRegie J. B. De Vivie Derendarz D. deRenstrom P. A. Bruckman Dupl'akova Nikoleta Duplakova Nikoleta Faucci Giannelli M. Fauccigiannelli M. FaucciGiannelli M. Yusuff I. Yusuff' I. Yao W-M Yao W-M. Yao W. -M Yao W. -M.

In reply to regex: help for improvement by frazap

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.