in reply to split keywords

Regarding the code that you tried, it looks like it will place "<keyword>" and "</keyword>" around the separators (commas, etc) as well as around the keywords themselves; looks like you need to study "perldoc perlre" a little more regarding the "zero-width positive look-ahead assertion", because you're using it here in a way that doesn't help your task.

You want "split" to return all the original input characters, and just put "keyword" tags immediately around each string which does not consist of separator characters. So add some logic to the "map" block, like this:

my $input = "kw1,kw2; kw3 &mdash; kw4&hyphen;kw5"; # separator is any string consisting of comma, semicolon, # &mdash;, &ndash; or &hyphen;, bounded by 0 or more whitespace: my $sep = qr{ \s* (?: , | ; | \&(?:[mn]dash|hyphen); ) \s* }x; # in the map block, add keyword tags to non-separator items my @out = map { /$sep/ ? $_ : "<keyword>$_</keyword>" } split /($sep)/ +, $input; print join "\n",@out,"";
In this case, whitespace alone will not trigger a split; a single keyword item could contain multiple words separated by whitespace.