Unicode meaning

yo͝onĭ-kōd
A character encoding standard for computer storage and transmission of the letters, characters, and symbols of most languages and writing systems.
noun
1
0
A computer standard for encoding characters. Each character is represented by sixteen bits. Whereas ASCII, being an 8-bit encoding scheme, can only represent 256 characters, Unicode has 65,536 combinations, enabling it to encode the letters of all written languages as well as thousands of characters in languages such as Japanese and Chinese.
0
0
A character code that defines every character in most of the speaking languages in the world. Although commonly thought to be only a two-byte coding system, Unicode characters can use only one byte, or up to four bytes, to hold a Unicode "code point" (see below). The code point is a unique number for a character or some symbol such as an accent mark or ligature. Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the number in hex; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F (see hex chart).Character Encoding SchemesThere are several formats for storing Unicode code points. When combined with the byte order of the hardware (big endian or little endian), they are known officially as "character encoding schemes." They are also known by their UTF acronyms, which stand for "Unicode Transformation Format" or "Universal Character Set Transformation Format."UTF-8, 16 and 32The UTF-8 coding scheme is widely used because words from multiple languages and every type of symbol can be mixed together in the same message without having to reserve multiple bytes for every character as in UTF-16 or UTF-32. With UTF-8, if only ASCII text is required, a single byte is used per character with the high-order bit set to 0. If non-ASCII characters require more than one byte, the high-order 1 bits of the byte define how many bytes are used. See byte order, DBCS and emoji.Unicode ISO NumberCoding 10646 ofScheme Equivalent Bytes Order** UTF-8 1-4 BE or LE UTF-16 (UCS-2) 2 BE or LE UTF-16BE (UCS-2) 2 BE UTF-16LE (UCS-2) 2 LE UTF-32 (UCS-4) 4 BE or LE UTF-32BE (UCS-4) 4 BE UTF-32LE (UCS-4) 4 LE Pure ASCII (compatible with early 7-bit email systems) UTF-7 1-4 BE or LE **Byte Order (see byte order) BE = big endian LE = little endian.
0
0
A set of standard coding schemes intended to replace the multiple coding schemes currently used, worldwide. The Unicode Consortium developed the original standard, Unicode Transformation Format-16 (UTF-16), in 1991 as a standard coding scheme to support multiple complex alphabets such as Chinese, Devanagri (Hindi), Japanese, and Korean. In the Japanese language, for example, even the abbreviated Kanji writing system contains well over 2,000 written ideographic characters; the Hirigana and Katakana alphabets add considerably to the complexity.As 7- and 8-bit coding schemes cannot accommodate such complex alphabets, computer manufacturers traditionally have taken proprietary approaches to this problem through the use of two linked 8-bit values. UTF-16 supports 65,536 (2 16 ) characters, which accommodates the most complex alphabets. Unicode accommodates pre-existing standard coding schemes, using the same byte values for consistency. For example, Unicode mirrors ASCII in UTF7 and EBCDIC in UTF-EBCDIC, specifically for IBM mainframes. UTF-8 supports any universal character in the Unicode range, using one-to-four octets (eight-bit bytes) to do so, depending on the symbol. UTF-32 uses four octets for each symbol, but is rarely used due to its inherent inefficiency.The Unicode Standard has been adopted by most, if not all, major computer manufacturers and software developers, and is required by modern standards such as CORBA 3.0, ECMAScript (JavaScript), Java, LDAP, WML, and XML. Unicode is developed in conjunction with the International Organization for Standardization (ISO) and Internet Engineering Consortium (IEC), which also define the Universal Character Set (UCS), into which the UTF code sets map. UCS-4 is a four-octet code set into which UTF-32 maps and UCS-2 is a two-byte code set into which UTF-16 maps. UCS-1 encodes all characters in byte sequences varying from one to five bytes. See also ASCII, code set, EBCDIC, IEC, and ISO.
0
0
An international character set that was built to represent all characters using a 2-byte (16-bit) format. About 30,000 characters from languages around the globe have been assigned characters in a format agreed upon internationally. The programming language Java and the Windows operating system use Unicode characters by storing them in memory as 16-bit values. In the C/C++ programming language, a character is 8 bits. In Windows and Java, “utilizing Unicode” means using UTF-16 as the character-­encoding standard to not only manipulate text in memory but also pass strings to APIs. Windows developers interchangeably use the terms “Unicode string” and “wide string” (meaning “a string of 16-bit characters”). Orendorff, J. Unicode for Programmers (draft). [Online, March 1, 2002.] Orendorff Website. http://www.jorendorff.com/articles/unicode/index.html.
0
0
Advertisement