in reply to Regex For Removing Emoji

Also see Text::Unidecode and especially for sanitizing titles for URLs, Text::CleanFragment.

Both err rather on the side of leaving things out rather than keeping things in.

It seems your regular expressions attempt to remove whole Unicode character planes. Personally, I would explicitly allow some character planes or look at the unicode properties (maybe via Unicode::Tussle to find out whether a character is part of a script.

Also consider what you want to do with character art: (╯°□°)╯︵ ┻━┻

Replies are listed 'Best First'.
Re^2: Regex For Removing Emoji
by Beaker (Beadle) on Nov 12, 2016 at 17:02 UTC
    Thanks I will check out those modules you mentioned. I do a lot of manual text sanitization and manipulation so I should probably try and "third party" some of it.