Unicode
A character code that defines every character in most of the speaking languages in the world. Although commonly thought to be only a two-byte coding system, Unicode characters can use only one byte, or up to four bytes, to hold a Unicode "code point." The code point is a unique number for a character or some character aspect such as an accent mark or ligature. Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the number in hex; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F (see hex chart).
Character Encoding Schemes
There are several formats for storing Unicode code points. When combined with the byte order of the hardware (big endian or little endian), they are known officially as "character encoding schemes." They are also known by their UTF acronyms, which stand for "Unicode Transformation Format" or "Universal Character Set Transformation Format." See byte order.
Unicode ISO Number Coding 10646 of Byte Scheme Equivalent Bytes Order** UTF-8 1-4 BE or LE UTF-16 (UCS-2) 2 BE or LE UTF-16BE (UCS-2) 2 BE UTF-16LE (UCS-2) 2 LE UTF-32 (UCS-4) 4 BE or LE UTF-32BE (UCS-4) 4 BE UTF-32LE (UCS-4) 4 LE Pure ASCII (compatible with early 7-bit e-mail systems) UTF-7 1-4 BE or LE **Byte Order (see byte order) BE = big endian LE = little endian
Computer Desktop Encyclopedia THIS DEFINITION IS FOR PERSONAL USE ONLY
All other reproduction is strictly prohibited without permission from the publisher.
Copyright © 1981-2009 by Computer Language Company Inc. All rights reserved.
| Topic | Replies | Latest Post |
|---|---|---|
| "Get us Unicode!" | 104 | 11 hours ago |
| Unicode | 2 | 6 years ago |
Browse dictionary definitions near Unicode
Share on Facebook