HYanWong has asked for the wisdom of the Perl Monks concerning the following question:
I have a number of strings which terminate in the unicode character E2 80 8E. A bit of searching tells me this is the left-to-right mark (LRM), and it's not uncommon to find this in user-inputted data. These have been converted from user-inputted links in wikimedia commons, such as http://commons.wikimedia.org/wiki/File:Atelerix_algirus.jpg%E2%80%8E
I'm trying to trim "whitespace" from the end of these strings, and this conforms to my expectation of "whitespace", but of course, it doesn't match \s in a RE. I guess there are a number of other unicode control characters that are basically pointless when at the end of a string. Are there any perl modules that will trim strings, taking these unicode characters into account? Or do I have to look them all up myself :(
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Remove unicode "whitespace"
by 7stud (Deacon) on Feb 28, 2013 at 03:36 UTC | |
by HYanWong (Acolyte) on Feb 28, 2013 at 11:20 UTC | |
by Ratazong (Monsignor) on Feb 28, 2013 at 07:49 UTC | |
by daxim (Curate) on Feb 28, 2013 at 09:57 UTC | |
|
Re: Remove unicode "whitespace"
by Khen1950fx (Canon) on Feb 28, 2013 at 05:23 UTC | |
by HYanWong (Acolyte) on Feb 28, 2013 at 11:16 UTC | |
by Khen1950fx (Canon) on Feb 28, 2013 at 16:11 UTC | |
by HYanWong (Acolyte) on Mar 01, 2013 at 01:44 UTC | |
|
Re: Remove unicode "whitespace"
by ikegami (Patriarch) on Mar 01, 2013 at 10:25 UTC |