in reply to Strip utf-8 dangerous url chars
\w matches stuff not considered "letters" by Unicode. If you want to use Unicode's definitions, it's quite easy to do since it provides properties relevant to your question.
L Letters Nd Number, Decimal Digit
So you'd use
s/[^\p{L}\p{Nd}_]//g
Update: "^" was missing!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Strip utf-8 dangerous url chars
by AlfaProject (Beadle) on Apr 04, 2011 at 11:43 UTC | |
by ikegami (Patriarch) on Apr 04, 2011 at 15:52 UTC |