Problem: working on my Mac, I had $var = "dinámico" which would print like dinámico
Reason: the non-English text was stored in my Cocoa-based text editor as UTF-8. Printing it directly would print gobbledegook. Even trying to escape it using HTML::Entities would not work, as that would just escape gobbledegook.
Solution: First decode the UTF-8, then encode it using HTML::Entities. So, use Encode; use HTML::Entities; encode_entities(decode_utf8($var)) does the trick. Now I get dinámico which prints fine in the browser.
The following web page served from my Macbook Pro, plain html, stock untinkered Apache, renders just fine.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <title>test of accents</title> </head> <body> <h1>English</h1> <p>explores the complex dynamic between people and conservation as part of the mission</p> <h1>Spanish</h1> <p>explora el complejo dinámico entre la gente y la conservación como parte de la misión</p> <h1>Portuguese</h1> <p>explora a dinâmica complexa entre pessoas e a conservação como parte da missão</p> </body> </html>
The same page served via Perl/HTML::Template/CGI::App mechanism renders like crap, unless I go to my browser and change the text encoding to Mac-Roman (this is not required in the above case that works just fine, as is).
# in my Perl script sub getInterfaceText { my ($lang) = @_; my ($msg); my %text = ( "en" => { "msg" => qq| explores the complex dynamic between people and conservation as part of the mission |, }, "es" => { ;msg" => qq| explora el complejo dinámico entre la gente y la conservación como parte de la misión |, }, "pt" => { "msg" => qq| explora a dinâmica complexa entre pessoas e a conservação como parte da missão |, }, ); return $text{$lang}->{msg}; } my $lang = $cgi->param('lang') || substr(lc $ENV{"HTTP_ACCEPT_LANGUAGE"}, 0, 2) || "en"; my $msg = getInterfaceText($lang); my $tmpl->param(LANG => $lang, MSG => $msg,); #---- # in my html page retreived as http://path/to/webpage/?lang=es <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <title>test of accents</title> </head> <body> <h1><TMPL_VAR LANG></h1> <p><TMPL_VAR MSG></p> </body> </html>
I know my solution lies somewhere at the intersection of content-negotiation, unicode, and such mysteries that I know little about. I need to host this on a shared web server (aka, plain-vanilla, not-in-my-control webserver). The application also involves a database in which stuff is stored, and that stuff also has accents which similarly get clobbered when they are displayed in a web form and then updated. What can I do so this doesn't happen?
Update: Background explanation in the hope that it might lead to a better solution -- I am making an application that will be served in many different languages as far as the interface text is concerned. I could, of course, make separate html templates for each language, add language suffixes (.en, .es, .pt and so on), make sure the text in each of the templates is html escaped (why some needs to be while other doesn't still escapes me!), and serve based on $ENV{"HTTP_ACCEPT_LANGUAGE"} or explicitly chosen language or whatever. The problem is that I would have to maintain all those different templates. Make a change in one, and I would have to make a change in all. By making one template, and substituting the text strings accordingly... well, you get the idea... it is a lot better... one template to maintain. Right now I have 3 languages. I will probably end up getting 3 or 4 more languages.In reply to accents and diacritical marks in a web page by punkish
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |