sub tokenize_msg_w_oneregex { my ($msg) = @_; my $re = qr{(?:[^\w\'\$!,.-]+|(?:(?<=\D)[.,])|(?:[.,](?=\D|$)))+}; my %words = map {$_=>1} split $re, $msg; return keys %words; } #### Benchmark: timing 10000 iterations of Lists, One Regex, Strings... Lists: 4 wallclock secs ( 4.15 usr + 0.00 sys = 4.15 CPU) @ 2409.64/s (n=10000) One Regex: 4 wallclock secs ( 3.56 usr + 0.00 sys = 3.56 CPU) @ 2808.99/s (n=10000) Strings: 2 wallclock secs ( 2.33 usr + 0.00 sys = 2.33 CPU) @ 4291.85/s (n=10000) Rate Lists One Regex Strings Lists 2410/s -- -14% -44% One Regex 2809/s 17% -- -35% Strings 4292/s 78% 53% --