For a nice list of tools and references, there's this page: About RTF, Rich Text Format. It links to Perl RTF modules at CPAN of which RTF::Tokenizer looks like it would be useful for this purpose. Just read in each token, and if it's a text token, modify it accordingly. Then write the token out. You'll have to see how it deals with UTF-8 - it might convert them to code-points for you or it might retain text as UTF-8.