comment on

That's very curious... using your suggestion still yields no result on my machine, using activestate's activeperl 5.10.0 for x86 windows. I guess I'll see if installing strawberry perl makes a difference =/

update

Made no difference, it still won't play nice =(

2nd update

It actually does work, but the pl file itself had not saved itself in utf-8 format. hurray for annoying little 'last thing you think of' problems.

Thanks for the help, moritz!

3rd update

actually, it doesn't work. While the inline example you gave works fine (relying on __DATA__), moving the data to a file called "test.txt", saved in utf-8 encoding, and then running the code again on that file instead, fails again.

use utf8;
open(READ,"test.txt");
@lines = <READ>;
close(READ);
foreach (@lines) {
    $_ =~ s/(\p{Han}+?)\((\p{Hiragana}+?)\)/\\ruby{\1}{\2}/g;
   print;
}
# force file to save as unicode: &#26085;&#26412;&#35486;
[download]

text file:

This is English text with 日本語(にほんご) mixed in. To test multi-furi text: 繰(く)り返(かえ)し should lead to \ruby{繰}{く}り\ruby{返}{かえ}し.

resulting text:

This is English text with 日本語(にほんご) mixed in. To test multi-furi text: 繰(く)り返(かえ)し should lead to \ruby{繰}{く}り\ruby{返}{かえ}し.

... any ideas? =/

In reply to Re^2: matching unicode blocks with regular expressions by Pomax
in thread matching unicode blocks with regular expressions by Pomax

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.