\w matches stuff not considered "letters" by Unicode. If you want to use Unicode's definitions, it's quite easy to do since it provides properties relevant to your question.
L Letters Nd Number, Decimal Digit
So you'd use
s/[^\p{L}\p{Nd}_]//g
Update: "^" was missing!
In reply to Re: Strip utf-8 dangerous url chars
by ikegami
in thread Strip utf-8 dangerous url chars
by AlfaProject
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |