in reply to Re^3: substitution regex and unicode
in thread substitution regex and unicode

I understand what you mean and I tried it, but it doesn't work! On the countrary if I leave my script untouched it properly works: html page source (as it is shown by firefox 2) shows multibyte characters as they are, that is it doesn't make use of the corresponding html entities. Sorry I don't understand why I should encode data again. Probably you mean that I should process data into binary if I send them back to the database.
That is the un-encoded and encoded output:
  <!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>xxx</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>

r.1,1 1 ʾà-da-umtúg-Ⅱ 1 aktumtúg 1 íb-ivtúg sa₆ dar

r.1,2 NI-ra-arki

r.1,3 2 ʾà-da-umtúg-ii 1 ʾà-da-umtúg-i

and this if the encoded output:
<!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>Progetto Sinleqiunnini</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>

r.1,1 1 ʾà -da-umtúg-Ⅱ 1 aktumtúg 1 íb-ivtúg sa₆ dar

r.1,2 NI-ra-arki

r.1,3 2 ʾà -da-umtúg-ii 1 ʾà -da-umtúg-i

Even if these lines are not intellegible (it is a III millennium b.C. lenguage) only the first example is correct.

Replies are listed 'Best First'.
Re^5: substitution regex and unicode
by ikegami (Patriarch) on May 07, 2008 at 22:41 UTC

    I never said you should encode data *again*. I said *characters* need to be encoded. Once a character is encoded, it becomes a series of bytes.

    I could comment elaborate, but it would help to know what I'm commenting about (i.e. see the changes you've made to your program).