comment on

If you happen to have "use utf8" anywhere in your script, this is what is triggering the error messages. Your old code contains single-byte renderings of the accented characters (in whatever character set is native to your data and editor (latin1? cp-something-or-other?).

If you don't have "use utf8" anywhere in the script, then there is probably something in your environment that is setting locale in such a way to make Perl assume that "use utf8" ought to be in effect.

Anyway, if you put "no utf8" in the script, the problem should go away. Alternately, if you assign that string of single-byte accented characters to a scalar, and use the Encode::decode() method to create a utf8 version of the string, you should then be able to use the utf8 string in the regex:

use Encode;
...
my $s = 'äöüÄÖÜß';
my $u = decode('latin1', $s);
my $patternWUmlauts=qr/[\w$u]+/;
...
[download]

update: Of sourse, you would only use the decode approach if the data to be tested against the regex are in utf8 now, or if you want to make sure to produce utf8 output from the data (in the latter case, input data that happens to be single-byte would need to be decoded as well, before hitting it with the utf8 version of the regex). If the input is still single-byte, and you still want the output to be single-byte, just say "no utf8".

In reply to Re: legacy code, utf8 and Perl 5.8.0 by graff
in thread legacy code, utf8 and Perl 5.8.0 by barrachois

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.