I have a number of strings which terminate in the unicode character E2 80 8E. A bit of searching tells me this is the left-to-right mark (LRM), and it's not uncommon to find this in user-inputted data. These have been converted from user-inputted links in wikimedia commons, such as http://commons.wikimedia.org/wiki/File:Atelerix_algirus.jpg%E2%80%8E
I'm trying to trim "whitespace" from the end of these strings, and this conforms to my expectation of "whitespace", but of course, it doesn't match \s in a RE. I guess there are a number of other unicode control characters that are basically pointless when at the end of a string. Are there any perl modules that will trim strings, taking these unicode characters into account? Or do I have to look them all up myself :(
In reply to Remove unicode "whitespace" by HYanWong
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |