frazap has asked for the wisdom of the Perl Monks concerning the following question:
I came with the code below that seems to work (as far as I can test it).
My questions: how could I improved it ? (I think it will break if with unicode characters, what changes should I made to get it work with any character set ?)
Thanks
François
use strict; use warnings; while ( my $t = <DATA> ) { chomp $t; printf "orig: %-30s translated: %s\n", $t, translate($t); } sub translate { my $str = shift; $str =~ tr/-/ /; #replace - with a space $str =~ tr/a-zA-Z/ /cs; #replace non letter with a space my @words = split( /\s+/, $str ); foreach my $w (@words) { #insert a space when a upper case is inside a word if ( $w =~ /\p{isLower}\p{isUpper}/ ) { my @all; while ( $w =~ m/\G(\p{isUpper}*\p{isLower}+)/g ) { push @all, $1; } $w = join( " ", @all ); } else { $w = ucfirst( lc($w) ); # we are using side effect of fore +ach loop } } return join( ' ', @words ); } __DATA__ Acierno James S., Jr. Acierno James, Jr. Ackermann-Hirschi L. Agatonovic-Jovini T. Alba-Castro Jose-Luis Alconada Verzini M. J. AlconadaVerzini M. J. Alvarez Fernandez A. Alvarez-Bolado Gonzalo Alvarez-Gonzalez B. AlvarezGonzalez B. AlvarezPiqueras D Amor Dos Santos S. P. Amor DosSantos S. P. AmorDosSantos S. P da Costa F. Barreiro Guimaraes Dano Hoffmann M. DanoHoffmann M. Dell' Acqua A. Dell' Asta L. Dell'Acqua A. Dell'Asta L. Dell'Omo Giacomo della Volp D. della Volpe D. Della Volpe D. DeRegie J. B. De Vivie Derendarz D. deRenstrom P. A. Bruckman Dupl'akova Nikoleta Duplakova Nikoleta Faucci Giannelli M. Fauccigiannelli M. FaucciGiannelli M. Yusuff I. Yusuff' I. Yao W-M Yao W-M. Yao W. -M Yao W. -M.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: regex: help for improvement
by hippo (Archbishop) on Dec 14, 2018 at 09:14 UTC | |
|
Re: regex: help for improvement
by choroba (Cardinal) on Dec 14, 2018 at 09:22 UTC | |
by frazap (Monk) on Dec 14, 2018 at 14:11 UTC | |
by choroba (Cardinal) on Dec 14, 2018 at 14:35 UTC | |
by frazap (Monk) on Dec 14, 2018 at 15:25 UTC | |
by AnomalousMonk (Archbishop) on Dec 14, 2018 at 19:21 UTC | |
by Laurent_R (Canon) on Dec 14, 2018 at 18:16 UTC | |
|
Re: regex: help for improvement
by Eily (Monsignor) on Dec 14, 2018 at 09:37 UTC | |
by AnomalousMonk (Archbishop) on Dec 14, 2018 at 19:40 UTC | |
by frazap (Monk) on Dec 14, 2018 at 14:33 UTC | |
by Laurent_R (Canon) on Dec 14, 2018 at 18:23 UTC | |
by frazap (Monk) on Dec 20, 2018 at 14:05 UTC | |
by 1nickt (Canon) on Dec 20, 2018 at 14:31 UTC |