in reply to Strip utf-8 dangerous url chars

\w matches stuff not considered "letters" by Unicode. If you want to use Unicode's definitions, it's quite easy to do since it provides properties relevant to your question.

L Letters Nd Number, Decimal Digit

So you'd use

s/[^\p{L}\p{Nd}_]//g

Update: "^" was missing!

Replies are listed 'Best First'.
Re^2: Strip utf-8 dangerous url chars
by AlfaProject (Beadle) on Apr 04, 2011 at 11:43 UTC
    I tried to use [\W] and also [\p{L}]
    They works well for all languages, but when I'm trying to use it in web enviroment they don't work.
    With  [\p{L}] i get �����(black triangles with question mark) and the [\W] doesn't work at all. I mean it just removes all the letters is not english.

      First, I forgot the "^" in my pattern.

      Secondly, Perl doesn't know or care whether it was launched by a web server. Blaming this for the change in behaviour is misdirected. Obviously, the strings you are trying to process are not the same. Start by finding the difference in the strings using

      { use Data::Dumper; local $Data::Dumper::Useqq = 1; print Dumper($s); }