comment on

I am trying to read a char file that has special characters (accented characters) and then make substitutions in my string for the html equivalent when the program finds these chars. Here is my program:


use Encode;
use utf8;
#use open IO => ':locale';

#my $s = "El supersÃ³nico de los Indi ";
my $s1 = "El supero de los Indi ";

#$s1 = decode_utf8( $s);

print "\n\nStart string: $s1\n\n";

my $s2 = &fix_special_characters($s1);

print"\nEnd string: $s2\n\n";


sub fix_special_characters
{
my($string) = @_;

open(C,"<:utf8","chars.txt");
my @c = <C>;

for(my $i=0; $i < @c; $i++)
{
        my ($special,$htmlchar) = split(/\t/,$c[$i]);
        print "$special : $htmlchar";
        $string =~ s/$special/$htmlchar/ig;     ## this is generating 
+the error message
}

return $string;
}
[download]

This is the output when the substitution line (line 30) is commented:

Start string: El supero de los Indi

Á    : &Aacute;
á    : &aacute;
É    : &Eacute;
é   : &eacute;
Í    : &Iacute;
í    : &iacute;
Ñ    : &Ntilde;
ñ    : &ntilde;
Ó   : &Oacute;
ó   : &oacute;
Ú    : &Uacute;
ú    : &uacute;
Ü    : &Uuml;
ü    : &uuml;
¿   : &iquest;
¡   : &iexcl;End string: El supero de los Indi
[download]

However, when I uncode that substitution line I get the following error messages for every line in the char file:

Malformed UTF-8 character (unexpected non-continuation byte 0x20, imme
+diately after start byte 0xc1) in regexp compilation at sp.pl line 30
+, <C> line 16.
Malformed UTF-8 character (unexpected non-continuation byte 0x20, imme
+diately after start byte 0xc1) in regexp compilation at sp.pl line 30
+, <C> line 16.
[download]

I have spent hours trying different methods to make this work with no luck. Any monks out there that can help with this? Thank you

In reply to utf-8 problem by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.