Re: umlauts, special chars in perl regular expressions

Answer: it depends how the data is encoded. If it is utf8, \w will use utf8 rules for what is a letter (though this has been hotly debated; you may be better off using \p{Word} instead; see 5.8's perlre).

If there are those characters in the 128-255 range but it is not utf8 encoded, you either make it so (see utf8), or do "use locale;" and have the LANG environment var set to a suitable locale.

Comment on Re: umlauts, special chars in perl regular expressions

Replies are listed 'Best First'.
Re: Re: umlauts, special chars in perl regular expressions by amonroy (Scribe) on Apr 21, 2004 at 23:32 UTC
how do you make sure a string it's utf-8 encoded? I tried this and I don't get what I would expect. `my $string = 'e1ñe'; if ($string =~ /^\w+$/) { print "yes"; } else { print "no"; } print "\n"; __OUTPUT___ yes` [download]	[reply] [d/l]
Re: Re: Re: umlauts, special chars in perl regular expressions by ysth (Canon) on Apr 22, 2004 at 01:50 UTC
Some of the ways: `$outstr = $instr; utf8::upgrade($outstr); # or $outstr = Encode::decode("latin-1", $instr); # or add and remove a utf8 character: $outstr = $instr . "\x{100}"; chop $outstr;` [download]	[reply] [d/l]