s/(\p{Han}+?)\((\p{Hiragana}+?)\)/\\ruby{\1}{\2}/g;
Running your code with this regex yields
This is English text with \ruby{日本語}{にほんご} mixed in. To test multi-furi text: \ruby{繰}{く}り\ruby{返}{かえ}し
Which I hope is correct.
Update: the output above was produced with perl-5.8.8 on Linux, and can be reproduced with perl-5.10.0. I used the script below (the code tags of perlmonks will kill the example input, though):
use utf8; binmode DATA, ':encoding(UTF-8)'; binmode STDOUT, ':encoding(UTF-8)'; while (<DATA>) { $_ =~ s/(\p{Han}+?)\((\p{Hiragana}+?)\)/\\ruby{\1}{\2}/g; print; } __DATA__ This is English text with 日本語(にほ +2435;ご) mixed in. To test multi-furi text: 繰(く)& +#12426;返(かえ)し # should lead to \ruby{繰}{く}り\ruby{返}{ +363;え}し.
In reply to Re: matching unicode blocks with regular expressions
by moritz
in thread matching unicode blocks with regular expressions
by Pomax
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |