This article does not discuss how Unicode works in SAP systems, but is a general overview intended for SAP users and developers.
This article is an abstract of the Unicode wikipedia article, but here some statements are not completely true as exceptions are not indicated. Its goal is to give a general understanding.
See also SDN article - Unicode: Technical FAQs.
Objectives of Unicode
Unicode is a computing industry standard whose goal is to propose a unique character set and character encoding containing all characters used in the world, and defining rules to store these characters in form of bytes in memory or on physical supports.
Before that, every combination of country / software or hardware manufacturer company / standards organization (thousands of combinations) had its own character set, and a given sequence of bytes could represent anything without knowing the encoding. Moreover, even when knowing the encoding, the computers couldn't work with characters of other countries as they were designed to handle only a limited number of characters: there are thousands of Chinese, Japanese, Korean characters, while Occidental computers store a maximum of 256 characters.
Unicode handles up to 1 million characters, but less than 65536 characters are sufficient to store all characters currently used in the world, thus 2 bytes are sufficient to store each of these most-used characters.
A Unicode character is represented by a code point preceded by U+. For example, U+0041 represents occidental letter "A" (upper case).
Unicode characters are U+0000 to U+D7FF, U+E000 to U+FFFF, U+010000 to U+10FFFF.
Note: code points D800 to DFFF are not characters, these values are used for technical reasons in UTF-16 encoding.
Unicode characters are to be represented in sequence of bytes using character encodings named Unicode Transformation Format (UTF):
- UTF-16, little or big endian
- UTF-32, little or big endian