in reply to Re: Detecting Strange Characters in Text?
in thread Detecting Strange Characters in Text?

Correct :) However, it was extended unofficially but consistantly. See ASCII for both the ASCII standard (7bit) and the industry ASCII extension (8bit)

Jason L. Froebe

Team Sybase member

No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

  • Comment on Re^2: Detecting Strange Characters in Text?

Replies are listed 'Best First'.
Re^3: Detecting Strange Characters in Text?
by jhourcle (Prior) on Jun 17, 2005 at 03:36 UTC
    However, it was extended unofficially but consistantly. See ASCII for both the ASCII standard (7bit) and the industry ASCII extension (8bit)
    <pedantic_mode>

    Which industry?

    ASCII is 7 bit, as specified in ANSI X3.4-1986.

    There are a number of 8 bit character sets that are rather similar to ASCII in their first 128 characters, but there is no one official 'extended ASCII'. There are extended versions of ASCII, such as Latin-1, MacRoman, Windows-1252, etc, but not a single one of them is consistent with each other, and not a single one of them is ASCII.

    Calling Windows-1252 the 'industry ASCII extension', because it has all of the ASCII characters would be like calling Spanglish the 'standard English extension'. What about Australian? Chicano? Texan? Yes, they all have common roots, and many similarities, and if you knew some other dialect, you could probably figure out most of what the other person was saying, but there is no one that can claim to be the primary extension.

    </pedantic_mode>

    (this rant comes from years of dealing with e-mail support, and having to deal with people putting 'smiley face' characters in the subject line, which was did bad things to an ANSI terminal or modems with software flow control, and then having to deal with it all over again, when netscape and IE decided that '&#xxx;' was a good way to represent characters, never mind that Mac, Unix, and Windows machines all displayed different characters unless you stuck to specific ranges ... but MS Word can 'save as HTML' and you can keep your curly quotes! (so long as you're the one who looks at the page, so you'll never understand that other people aren't seeing the same thing displayed on their screen).)