use Modern::Perl; use utf8; while () { my @words = ($_ =~ m/[a-zA-Z0-9]+\s/g); say 'Regex : ', join '*', @words; @words = ($_ =~ m/[a-zA-Z0-9]+\s*/g); say 'Regex *: ', join '*', @words; @words = split /[^\pL\pN]+/; # split on non letters + numbers say 'split: ', join '*', @words; } __DATA__ This is a simple sentence. This one has punctuation, indeed it has! And multiple spaces all over the place ! And nön-ascii chàraçtérs, wôw! What about l33t sp33ch 4 u? #### Regex : This *is *a *simple Regex *: This *is *a *simple *sentence split: This*is*a*simple*sentence Regex : This *one *has *indeed *it Regex *: This *one *has *punctuation*indeed *it *has split: This*one*has*punctuation*indeed*it*has Regex : And *multiple *spaces *all *over *the *place Regex *: And *multiple *spaces *all *over *the *place split: And*multiple*spaces*all*over*the*place Regex : And *ascii Regex *: And *n*n*ascii *ch*ra*t*rs*w*w split: And*nön*ascii*chàraçtérs*wôw Regex : What *about *l33t *sp33ch *4 Regex *: What *about *l33t *sp33ch *4 *u split: What*about*l33t*sp33ch*4*u