s/(\p{Han}+?)\((\p{Hiragana}+?)\)/\\ruby{\1}{\2}/g; #### use utf8; binmode DATA, ':encoding(UTF-8)'; binmode STDOUT, ':encoding(UTF-8)'; while () { $_ =~ s/(\p{Han}+?)\((\p{Hiragana}+?)\)/\\ruby{\1}{\2}/g; print; } __DATA__ This is English text with 日本語(にほんご) mixed in. To test multi-furi text: 繰(く)り返(かえ)し # should lead to \ruby{繰}{く}り\ruby{返}{かえ}し.