how could I improved it ?
The best thing you could do in general terms is to put the subroutine into a module and then write a test script (eg. using Test::More) which would compare your test set of input data with your required set of output data. This allows you to add extra functionality later while catching regressions.
Here are a couple of specific suggestions, though. When I saw this:
$str =~ tr/-/ /; #replace - with a space $str =~ tr/a-zA-Z/ /cs; #replace non letter with a space
I took some time to wonder why the first statement was there when the second statement seemed to render it obsolete. Why not remove the top one?
Also, I think the if-block for breaking camelCase could be greatly simplified. eg:
$w =~ s/(\p{isLower})(\p{isUpper})/$1 $2/g or $w = ucfirst( lc($w) );
This shortens the code to:
#!/usr/bin/perl use strict; use warnings; while ( my $t = <DATA> ) { chomp $t; printf "orig: %-30s translated: %s\n", $t, translate($t); } sub translate { my $str = shift; $str =~ tr/a-zA-Z/ /cs; #replace non letter with a space my @words = split( /\s+/, $str ); foreach my $w (@words) { #insert a space when a upper case is inside a word $w =~ s/(\p{isLower})(\p{isUpper})/$1 $2/g or $w = ucfirst( lc($w) ); # we are using side effect of fore +ach loop } return join( ' ', @words ); } __DATA__ Acierno James S., Jr. Acierno James, Jr. Ackermann-Hirschi L. Agatonovic-Jovini T. Alba-Castro Jose-Luis Alconada Verzini M. J. AlconadaVerzini M. J. Alvarez Fernandez A. Alvarez-Bolado Gonzalo Alvarez-Gonzalez B. AlvarezGonzalez B. AlvarezPiqueras D Amor Dos Santos S. P. Amor DosSantos S. P. AmorDosSantos S. P da Costa F. Barreiro Guimaraes Dano Hoffmann M. DanoHoffmann M. Dell' Acqua A. Dell' Asta L. Dell'Acqua A. Dell'Asta L. Dell'Omo Giacomo della Volp D. della Volpe D. Della Volpe D. DeRegie J. B. De Vivie Derendarz D. deRenstrom P. A. Bruckman Dupl'akova Nikoleta Duplakova Nikoleta Faucci Giannelli M. Fauccigiannelli M. FaucciGiannelli M. Yusuff I. Yusuff' I. Yao W-M Yao W-M. Yao W. -M Yao W. -M.
HTH.
(Edited to fix the Test::More link - thanks Laurent_R and kcott for pointing this out)
In reply to Re: regex: help for improvement
by hippo
in thread regex: help for improvement
by frazap
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |