Also see Text::Unidecode and especially for sanitizing titles for URLs, Text::CleanFragment.
Both err rather on the side of leaving things out rather than keeping things in.
It seems your regular expressions attempt to remove whole Unicode character planes. Personally, I would explicitly allow some character planes or look at the unicode properties (maybe via Unicode::Tussle to find out whether a character is part of a script.
Also consider what you want to do with character art: (╯°□°)╯︵ ┻━┻
In reply to Re: Regex For Removing Emoji
by Corion
in thread Regex For Removing Emoji
by Beaker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |