This is the output when the substitution line (line 30) is commented:use Encode; use utf8; #use open IO => ':locale'; #my $s = "El supersónico de los Indi "; my $s1 = "El supero de los Indi "; #$s1 = decode_utf8( $s); print "\n\nStart string: $s1\n\n"; my $s2 = &fix_special_characters($s1); print"\nEnd string: $s2\n\n"; sub fix_special_characters { my($string) = @_; open(C,"<:utf8","chars.txt"); my @c = <C>; for(my $i=0; $i < @c; $i++) { my ($special,$htmlchar) = split(/\t/,$c[$i]); print "$special : $htmlchar"; $string =~ s/$special/$htmlchar/ig; ## this is generating +the error message } return $string; }
However, when I uncode that substitution line I get the following error messages for every line in the char file:Start string: El supero de los Indi Á : Á á : á É : É é : é Í : Í í : í Ñ : Ñ ñ : ñ Ó : Ó ó : ó Ú : Ú ú : ú Ü : Ü ü : ü ¿ : ¿ ¡ : ¡End string: El supero de los Indi
I have spent hours trying different methods to make this work with no luck. Any monks out there that can help with this? Thank youMalformed UTF-8 character (unexpected non-continuation byte 0x20, imme +diately after start byte 0xc1) in regexp compilation at sp.pl line 30 +, <C> line 16. Malformed UTF-8 character (unexpected non-continuation byte 0x20, imme +diately after start byte 0xc1) in regexp compilation at sp.pl line 30 +, <C> line 16.
In reply to utf-8 problem by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |