The solution above recommending code pages and unicode sounds robust and I would explore using a similar solution. However, just for grins, here is a poor man's version that might work for you. I just created a mapping of the DOS extended characters to the Windows ASCII extended characters and use that mapping to replace the appropriate characters. It's not overly robust and I'm not 100% sure it's correct (since I did the mapping myself by looking at DOS and ASCII character charts). Your mileage may vary.
Update: Note that there is not a 1 to 1 mapping between the DOS and ANSI code pages. I mostly just mapped the accented letters and a couple of symbols (such as "cents" etc.) For most text (including Latin foreign languages) this mapping should work fairly well. However, it's not a very robust solution and not very pretty code so if you need to do a lot of this I would recommend one of the other solutions suggested in this thread.
Update #2 In case it's not clear, the hash %asc2dos maps the ANSI (e.g. Windows) ASCII value to the equilavent DOS ASCII value. I then reverse the hash so %dos2asc contains the mapping from DOS back to Windows. As an aside, does anyone have any suggestions for a better or more idiomatic way to reverse the hash (i.e. use the keys as values and vice versa) than what I did below?
#!/usr/local/bin/perl
use strict;
use warnings;
#mapping of ASCII to DOS
my %asc2dos = (131,159,149,250,150,196,161,173,162,155,163,156,165,157
+,170,166,171,174,172,170,176,248,177,241,178,253,183,249,186,167,187,
+175,188,172,189,171,191,168,196,142,197,143,198,146,199,128,201,144,2
+09,165,214,153,220,154,223,225,224,133,225,160,226,131,228,132,229,13
+4,230,145,231,135,232,138,233,130,234,136,235,137,236,141,237,161,238
+,140,239,139,241,164,242,149,243,162,244,147,246,148,247,246,249,151,
+250,163,251,150,252,129,255,152);
#create the reverse mapping (DOS to ASCII)
my %dos2asc;
foreach my $key (sort keys %asc2dos)
{
$dos2asc{$asc2dos{$key}} = $key;
}
#here's a test:
#create a string with some accented characters
my $string = pack("C10",223,224,225,232,231,236,237,241,243,244);
print "ASCII string = $string\n";
$string = asc2dos($string);
print "DOS string = $string\n";
$string = dos2asc($string);
print "ASCII string = $string\n";
#convert ASCII extended characters to DOS extended characters
sub asc2dos
{
my $str = shift;
foreach my $i (0..length($str)-1)
{
my $val = ord substr($str,$i,1);
substr($str,$i,1) = chr $asc2dos{$val} || $val;
}
return $str;
}
#convert DOS extended characters to ASCII characters
sub dos2asc
{
my $str = shift;
foreach my $i (0..length($str)-1)
{
my $val = ord substr($str,$i,1);
substr($str,$i,1) = chr $dos2asc{$val} || $val;
}
return $str;
}
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.