Re^7: What's the 'M-' characters and how to filter/correct them?

Further to shmem's post above:
If it's just a question of translating text from single seventh-bit-set characters to some kind of pure-ASCII representation, it's possible to do this in one swell foop. The process of defining the translations can be a bit tedious, but it's done just once (update: and could even be done in a module for general inclusion). Note that in the code below, the German eszett/sharp-s "ß" translates to the "ss" ASCII letter pair. See also the possibly useful discussion of something like this for multi-byte Unicode sequences in the recent threads Read RegEx from file and particularly Re: Read RegEx from file.

use warnings;
use strict;

use Data::Dump qw(dd);


my @acutes = qw(193 A  201 E  205 I  211 O  218 U  225 a  233 e  237 i
+  243 o  250 u  221 Y  253 y);
my @graves = qw(192 A  200 E  204 I  210 O  217 U  224 a  232 e  236 i
+  242 o  249 u);
my @others = qw(228 a  246 o  223 ss);

my %xlate = (@acutes, @graves, @others);
# dd \%xlate;  # FOR DEBUG

my ($search) =
    map  qr{[$_]}xms,
    join '',
    map  sprintf('\%03o', $_),
    keys %xlate
    ;
# dd $search;  # FOR DEBUG

while (my $line = <DATA>) {
    chomp $line;
    $line =~ s{ ($search) }{$xlate{ord $1}}xmsg;
    die "non-ascii in '$line'" if $line =~ m{ [[^:ascii:]] }xms;
    print "'$line' \n";
    }


__DATA__
spanish: Depósito Centralízado
french: voilà: a word with an accent gravè
vanilla: this is plain ascii
other: gräßliches Tröten
[download]

Output:

c:\@Work\Perl\monks\sylph001>perl xlate_to_ascii_1.pl
'spanish: Deposito Centralizado'
'french: voila: a word with an accent grave'
'vanilla: this is plain ascii'
'other: grassliches Troten'
[download]

Give a man a fish: <%-{-{-{-<

Comment on Re^7: What's the 'M-' characters and how to filter/correct them? Select or Download Code