h@kim has asked for the wisdom of the Perl Monks concerning the following question:

I have some documents encoded in UTF8, and contain non-English characters. for example I have a document in Arabic, a document in Urdu, and one in Persian (Farsi) These documents contain some special characters like non-breaking space, zero width non-joiner space, non-breaking hyphen, and so on. I want to find and replace these special characters with space. How can I do this by Perl?
  • Comment on Find and replace Non breaking space characters in a UTF8 text file

Replies are listed 'Best First'.
Re: Find and replace Non breaking space characters in a UTF8 text file
by flexvault (Monsignor) on Jun 30, 2012 at 13:47 UTC

    h@kim,

    Doing a 'Super Search' on "utf8 replace special characters", I found this. It should be a good start.

    Good Luck

    "Well done is better than well said." - Benjamin Franklin

    http://www.perlmonks.org/?node_id=481164
Re: Find and replace Non breaking space characters in a UTF8 text file
by zentara (Cardinal) on Jun 30, 2012 at 15:48 UTC
    See UTF-8 and File::Find . It shows how to search for a unicode character, you can easily add a substitution regex ( like $_ =~ s/本/ /g for example ), in the file reading loop.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh