in reply to Regex For Removing Emoji
Also see Text::Unidecode and especially for sanitizing titles for URLs, Text::CleanFragment.
Both err rather on the side of leaving things out rather than keeping things in.
It seems your regular expressions attempt to remove whole Unicode character planes. Personally, I would explicitly allow some character planes or look at the unicode properties (maybe via Unicode::Tussle to find out whether a character is part of a script.
Also consider what you want to do with character art: (╯°□°)╯︵ ┻━┻
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex For Removing Emoji
by Beaker (Beadle) on Nov 12, 2016 at 17:02 UTC |