in reply to What's the 'M-' characters and how to filter/correct them?
So, my monks, could you kindly let me know how do these 'M-' characters come from? Are they also the control characters?
Another question is, is it possible to use perl to clean-up/correct these 'M-' characters?
These are characters with the 8th bit set, and the M-s is cats way to display a "ó" if you ask for it. The examples you provided are ISO-8859 (or some variant). If your database uses the same encoding, there's no need to clean up your data. If your databases uses UTF-8, the following should suffice
use Encode qw(from_to); while (<>) { from_to($_,'latin1','utf-8'); print; }
to convert your data to UTF-8.
|
|---|