sylph001 has asked for the wisdom of the Perl Monks concerning the following question:
Respected monks,
Recently I got an Excel spreadsheet and I extracted the data from it so that I can populate these data into a database.
However from the extracted data, I see some outstanding 'M-' characters.
It looks interesting that some of the 'M-' characters seems was interpreted from the visible special characters from the original spreadsheet, while the other of the 'M-' looks like trivail.
I paste each of the examples below:
'M-' that looks come from special characters
user@server> cat data1.txt
Depósito Centralizado
user@server> cat -v data1.txt
DepM-ssito Centralizado
'M-' that looks trivial
user@server> cat data2.txt
London and NewYork
user@server> cat -v data2.txt
London andM- NewYork
The spreadsheet was generated from Windows platform, and I'm extracting it on linux.
To make sure to filter out the usually mentioned "Windows Control Characters" like the '^M' and so on, I used dos2unix for each of the data files, but the 'M-' characters didn't disappear.
So, my monks, could you kindly let me know how do these 'M-' characters come from? Are they also the control characters?
Another question is, is it possible to use perl to clean-up/correct these 'M-' characters?
Many thanks in advance
|
|---|