in reply to Re^3: Encoding/decoding question
in thread Encoding/decoding question

I doubt that. I suspect the HTML was buggy too.

Could you show the HTML's HEAD element and the od -c output for réserve?

( Update: hum, .exe? You might not have od. Alternative: perl -nE"say unpack 'H*', $_ if /serv/;" file.html )

I once again recommend the uniquote program for such things. It is really way better than od or cat -v or anything, because it actually shows you the proper characters.
$ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote r\N{U+E9}serve $ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote -x r\x{E9}serve $ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote -v r\N{LATIN SMALL LETTER E WITH ACUTE}serve $ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote -b r\xC3\xA9serve $ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote --xml r&#xe9;serve $ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote --html r&#233;serve $ perl -Mutf8 -CS -wle 'print "réserve"' | uniquote --html --verbose r&eacute;serve $ perl -Mutf8 -CS -wle 'print "réserve"' | nfd | uniquote -v re\N{COMBINING ACUTE ACCENT}serve $ perl -Mutf8 -CS -wle 'print "réserve"' | iconv -f UTF-8 -t UTF-16 | +uniquote --encoding=UTF-16 -x r\x{E9}serve $ perl -Mutf8 -CS -wle 'print "réserve"' | iconv -f UTF-8 -t UTF-16 | +uniquote -b \xFE\xFF\x00r\x00\xE9\x00s\x00e\x00r\x00v\x00e\x00 $ perl -Mutf8 -CS -wle 'print "réserve"' | iconv -f UTF-8 -t MacRoman +| uniquote --encoding=MacRoman -x r\x{E9}serve $ perl -Mutf8 -CS -wle 'print "réserve"' | iconv -f UTF-8 -t MacRoman +| uniquote -b r\x8Eserve $ perl -Mutf8 -CS -wle 'print "réserve"' > reserve.utf8 $ iconv -f UTF-8 -t MacRoman < reserve.utf8 > reserve.macroman $ iconv -f UTF-8 -t UTF16-BE < reserve.utf8 > reserve.utf16be $ uniwc reserve.{macroman,utf8,utf16be} Paras Lines Words Graphs Chars Bytes File 0 1 1 8 8 8 reserve.macroman 0 1 1 8 8 9 reserve.utf8 0 1 1 8 8 16 reserve.utf16be $ uniquote reserve.{macroman,utf8,utf16be} r\N{U+E9}serve r\N{U+E9}serve r\N{U+E9}serve $ uniquote -b reserve.{macroman,utf8,utf16be} r\x8Eserve r\xC3\xA9serve \x00r\x00\xE9\x00s\x00e\x00r\x00v\x00e\x00
See how nifty that is?

Replies are listed 'Best First'.
Re^5: Encoding/decoding question
by ikegami (Patriarch) on Sep 12, 2011 at 02:35 UTC
    Yes, uniquote -b produces a similar output to od -c, but why would I have the user approximate what I want using a tool he doesn't have?
      Yes, uniquote -b produces a similar output to od -c, but why would I have the user approximate what I want using a tool he doesn't have?
      Because what you told him to do did not provide the information needed to diagnose this problem. Notice his response.

      Plus giving people tools empowers them to become fishermen.

Re^5: Encoding/decoding question
by Anonymous Monk on Sep 12, 2011 at 20:34 UTC
    That assumes OP has utf8 capable shell on doesn' it?