in reply to Re: Convert Windows-1252 Characters to Java Unicode Notation
in thread Convert Windows-1252 Characters to Java Unicode Notation
Simple and elegant.# Convert Windows-1252 characters into Java's Unicode notation... $md->{$column} =~ s{([\x80-\xFF])}{ sprintf "\\u%04x", ord decode('cp1252', $1) }eg;
(By the way, the frequency of occurrence of non-US-ASCII characters in the data is very low in relation to the amount of text. So-called 8-bit characters are infrequent and usually occur in isolation.)
Jim
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Convert Windows-1252 Characters to Java Unicode Notation
by Juerd (Abbot) on Nov 25, 2007 at 23:05 UTC | |
|
Re^3: Convert Windows-1252 Characters to Java Unicode Notation
by bart (Canon) on Nov 26, 2007 at 11:55 UTC | |
by Jim (Curate) on Nov 26, 2007 at 17:55 UTC | |
by bart (Canon) on Nov 26, 2007 at 18:18 UTC | |
by Jim (Curate) on Nov 26, 2007 at 19:46 UTC |