comment on

how could I improved it ?

The best thing you could do in general terms is to put the subroutine into a module and then write a test script (eg. using Test::More) which would compare your test set of input data with your required set of output data. This allows you to add extra functionality later while catching regressions.

Here are a couple of specific suggestions, though. When I saw this:

$str =~ tr/-/ /; #replace - with a space $str =~ tr/a-zA-Z/ /cs; #replace non letter with a space
[download]

I took some time to wonder why the first statement was there when the second statement seemed to render it obsolete. Why not remove the top one?

Also, I think the if-block for breaking camelCase could be greatly simplified. eg:

$w =~ s/(\p{isLower})(\p{isUpper})/$1 $2/g or
    $w = ucfirst( lc($w) );
[download]

This shortens the code to:

#!/usr/bin/perl
use strict;
use warnings;

while ( my $t = <DATA> ) {
    chomp $t;
    printf "orig: %-30s translated: %s\n", $t, translate($t);

}

sub translate {
    my $str = shift;
    $str =~ tr/a-zA-Z/ /cs;    #replace non letter with a space
    my @words = split( /\s+/, $str );
    foreach my $w (@words) {
        #insert a space when a upper case is inside a word
        $w =~ s/(\p{isLower})(\p{isUpper})/$1 $2/g or
            $w = ucfirst( lc($w) ); # we are using side effect of fore
+ach loop
    }
    return join( ' ', @words );
}
__DATA__
Acierno James S., Jr.
Acierno James, Jr.
Ackermann-Hirschi L.
Agatonovic-Jovini T.
Alba-Castro Jose-Luis
Alconada Verzini M. J.
AlconadaVerzini M. J.
Alvarez Fernandez A.
Alvarez-Bolado Gonzalo
Alvarez-Gonzalez B.
AlvarezGonzalez B.
AlvarezPiqueras D
Amor Dos Santos S. P.
Amor DosSantos S. P.
AmorDosSantos S. P
da Costa F. Barreiro Guimaraes
Dano Hoffmann M.
DanoHoffmann M.
Dell' Acqua A.
Dell' Asta L.
Dell'Acqua A.
Dell'Asta L.
Dell'Omo Giacomo
della Volp D.
della Volpe D.
Della Volpe D.
DeRegie J. B. De Vivie
Derendarz D.
deRenstrom P. A. Bruckman
Dupl'akova Nikoleta
Duplakova Nikoleta
Faucci Giannelli M.
Fauccigiannelli M.
FaucciGiannelli M.
Yusuff I.
Yusuff' I.
Yao W-M
Yao W-M.
Yao W. -M
Yao W. -M.
[download]

HTH.

(Edited to fix the Test::More link - thanks Laurent_R and kcott for pointing this out)

In reply to Re: regex: help for improvement by hippo
in thread regex: help for improvement by frazap

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.