Find and replace Non breaking space characters in a UTF8 text file

h@kim has asked for the wisdom of the Perl Monks concerning the following question:

I have some documents encoded in UTF8, and contain non-English characters. for example I have a document in Arabic, a document in Urdu, and one in Persian (Farsi) These documents contain some special characters like non-breaking space, zero width non-joiner space, non-breaking hyphen, and so on. I want to find and replace these special characters with space. How can I do this by Perl?

Comment on Find and replace Non breaking space characters in a UTF8 text file

Replies are listed 'Best First'.
Re: Find and replace Non breaking space characters in a UTF8 text file by flexvault (Monsignor) on Jun 30, 2012 at 13:47 UTC
h@kim, Doing a 'Super Search' on "utf8 replace special characters", I found this. It should be a good start. Good Luck "Well done is better than well said." - Benjamin Franklin http://www.perlmonks.org/?node_id=481164	[reply]
Re: Find and replace Non breaking space characters in a UTF8 text file by zentara (Cardinal) on Jun 30, 2012 at 15:48 UTC
See UTF-8 and File::Find . It shows how to search for a unicode character, you can easily add a substitution regex ( like $_ =~ s/本/ /g for example ), in the file reading loop. I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]