in reply to Re^3: iso-8859-1 code converter
in thread iso-8859-1 code converter

>>rovf
"Or did I misundertand you here completely?"

Yes. I will get that from the page listed. I need the ASCII representation to do so though, and that's what I'm after. >>graff
That seems to be on the right track. Never knew what ord() did. However, as before, I'm using utf8 or euc encoded strings, so ord returns the numeric representation of each byte, not each character. The above gives 6 distinct codes, not 3. My assumption is if I passed it iso-8859-1 encoded strings it might work, but my text editor (Kate on Linux) says they're not valid for that encoding, and chokes.

Wait, the values are actually just HTML representation (like © is the copyright sign). I guess I should just either plug my values into a calculator like below, or search for the table somwhere.
http://www.pinyin.info/tools/converter/chars2uninumbers.html

Sorry my confusion led to such a lengthy discussion. :?

Replies are listed 'Best First'.
Re^5: iso-8859-1 code converter
by graff (Chancellor) on May 06, 2009 at 03:22 UTC
    ... as before, I'm using utf8 or euc encoded strings...

    Okay, how do you know when you're using one or the other? Does the web site (play-asia.com) support both encodings, and if so, how do you tell it which one you're using?

    ord returns the numeric representation of each byte, not each character. The above gives 6 distinct codes, not 3.

    6 numerics instead of 3 for that 3-character string definitely means "not utf8" (so presumably euc, based on what you've said); if the web site is looking for utf8-encoded numerics, you should use Encode to convert from euc to utf8, then use ord():

    use Encode; ... my $string = "..."; # wherever you get your euc string from; $string = decode( "euc-jp", $string ); # now it's a utf8 string; ... # plug it into the url as described earlier

    What does "rovf" mean, and am I missing your point(s) completely? (sorry-- I just realized that "rovf" was another monk -- moving right along...)

    Your descriptions and replies are a bit hard for me to follow. And what do you mean by "the values are actually just HTML representation"? There are numeric character entity references (both decimal and hex, used in HTML, XML, SGML), there are symbolic entity references (like "á"), and there are uri-encoded versions of these (with the punctuation marks converted to "%" followed by two hex digits).

    Are you really still having a problem with this?