What is Unicode?
- ü Unicode is a universal encoded character set that allows you to store information from any language using a single character set.
- ü Unicode provides a unique code for every character, independent platform, program or language.
1. UTF-8 Encoding
- This is a 8 bit encoding
- It is a variable width multi-byte encoding
- One Unicode character can be 1 bye, 2 bytes or 3 bytes
2. UTF-16 Encoding
- This is the 16 bit encoding
- One Unicode character is 2 bytes
Need for Unicode Compliance
In the past, SAP developers used various codes to encode characters of different alphabets, for example, ASCII, EBCDI, or double-byte code pages.
- ASCII (American Standard Code for Information Interchange) encodes each character using 1 byte = 8 bit. This makes it possible to represent a maximum of 28 = 256 characters to which the combinations [00000000, 11111111] are assigned. Common code pages are, for example, ISO88591 for West European or ISO88595 for Cyrillic fonts.
- EBCDI (Extended Binary Coded Decimal Interchange) also uses 1 byte to encode each character, which again makes it possible to represent 256 characters. EBCDIC 0697/0500 is an old IBM format that is used on AS/400 machines for West European fonts, for example.
- Double-byte code pages require 1 or 2 bytes for each character. This allows you to form 216 = 65536 combinations where usually only 10,000 - 15,000 characters are used. Double-byte code pages are, for example, SJIS for Japanese and BIG5 for traditional Chinese.
Using these character sets, you can account for each language relevant to the SAP System. However, problems occur if you want to merge texts from different incompatible character sets in a central system. Equally, exchanging data between systems with incompatible character sets can result in unprecedented situations.
One solution to this problem is to use a code comprising all characters used on earth. This code is called Unicode (ISO/IEC 10646) and consists of at least 16 bit = 2 bytes, alternatively of 32 bit = 4 bytes per character.
Advantages of Unicode compliance
Although the conversion effort for the R/3 kernel and applications is considerable, the migration to Unicode provides great benefits in the long run:
- The Internet and consequently also mySAP.com are entirely based on Unicode, which thus is a basic requirement for international competitiveness.
- Unicode allows all R/3 users to install a central R/3 System that covers all business processes worldwide.
- Companies using different distributed systems frequently want to aggregate their worldwide corporate data. Without Unicode, they would be able to do this only to a limited degree.
- With Unicode, you can use multiple languages simultaneously at a single front end computer.
- Unicode is required for cross-application data exchange without loss of data due to incompatible character sets. One way to present documents in the World Wide Web (www) is XML, for example.
In addition, if new characters are added to the Unicode character set, SAP can decide whether to represent these characters internally using 2 or 4 bytes.
Transaction Code to Perform Unicode Compliance check:
Tcode to perform Unicode Check : UCCHECK
How to Set the Unicode Flag Active?
After resolving all the errors under the Unicode Check, we can set the Unicode Flag.This can be done in either of the two ways.
- From the Transaction UCCHECK itself, by selecting the success message of the object and then clicking on the Set Unicode Attribute button .
- Go directly to the object, Go To -> Attributes -> Change Display -> Check the Unicode Flag Active -> Save -> Activate.
Some of the Common Errors in UCCHECK and their solution.
Code lines with a asterisk * in front are the lines we commented out in the existing code to make it Unicode Compliant.
Codes lines without asterisk * are inserted into the object / program to compensate the functionalities provided by the lines which we commented out duing Unicode compliant check.
Case - 1:
Error: Upload/Ws_Upload and Download/Ws_Download are obsolete, since they are not Unicode-enabled; use the class cl_gui_frontend_services
Cause: From Version 4.7, the Function Modules UPLOAD / WS_UPLOAD / DOWNLOAD / WS_DOWNLOAD has become obsolete. In replacement of these Function Modules, we have GUI_UPLOAD / GUI_DOWNLOAD.
Solution: We need to replace all the UPLOAD / WS_UPLOAD with the GUI_UPLOAD FM and all the DOWNLOAD / WS_DOWNLOAD with the GUI_DOWNLOAD FMs. All the importing, exporting, exception parameters must be matched between the two Function Modules. If the exception in UPLOAD is not present in the GUI_UPLOAD, then leave it.
Note: The parameter FILENAME in GUI_UPLOAD is of type STRING. Before Version 4.7, UPLOAD / WS_UPLOAD FMs were used, in which this FILENAME is of data type Character. Now after replacing all the UPLOAD / WS_UPLOAD to GUI_UPLOAD, we need to take care of this case also. Else, if we perform an Extended Syntax Check using the transaction SLIN, we would get a Call Function Interface Error because of this. So to avoid that, we declare a variable of type STRING and assigning that to the Parameter FILENAME.
Case - 2:
Type - I
Error: You cannot use an offset without a length declaration for parameter "HEAD+2".
Cause: Usually, while passing parameters, we need to pass the variable as a whole or at least its position with length must be specified. Till Version 4.6C, if we are not going to specify the length then it won't throw any error but will take till the full length of the variable. But from Version 4.7, it will throw an error.
Solution: For any parameter, if we specify the position it should always be accompanied by its length*.*
* PERFORM APPEND_XFEBRE USING HEAD+2.
PERFORM APPEND_XFEBRE USING HEAD+2(98).
Type - II
Error: You cannot use ASSIGN f+offset. Always use an explicit length (or '').*
Cause: In Unicode environment, it needs the exact length. We can't say starting from 2nd position. Till Version 4.6C, it will take the entire variable starting from 2nd position. But in Unicode Environment, it is not possible.
Solution: Specify the length or put a *.
* ASSIGN STXT_LOC+1 TO <B>.
ASSIGN STXT_LOC+1* TO <B>.
Case - 3:Error :
In "TEXT MODE" the "ENCODING" addition must be specified.
Cause: Till Version 4.6C, when we open a Dataset it is not necessary to specify the encoding format. In Non-Unicode systems, this is not mandatory. But from Version 4.7, it is made mandatory.
Solution: After opening the dataset, specify the encoding format. There are three types of Encoding format.
on ENCODING (DEFAULT|UTF-8|NON-UNICODE)
This addition specifies the character representation in the file:
DEFAULT: Corresponds to UTF-8 in Unicode systems and to NON-UNICODE in non-Unicode systems.
UTF-8 : Characters are represented in the file in the format UTF-8.
NON-UNICODE : Characters are represented in the file in the code page defined by the text environment current at the time a READ or TRANSFER command is executed
*OPEN DATASET FILENAME FOR INPUT IN TEXT MODE MESSAGE MSG.
OPEN DATASET FILENAME FOR INPUT IN TEXT MODE ENCODING DEFAULT MESSAGE MSG.
Case - 4:
Error : Generated Code for View Maintenance Dialog is not Unicode-Compatible You can regenerate with the program RSVIMT_UC_VIEW_MAINT_GEN.
Cause: The Views that are created using the transaction SE11 can be used in various programs. When the programs are made Unicode compatible, these views become invalid as they are not Unicode compatible.
Solution: The views should be made Unicode-Compatible. That can be done by regenerating the views using the program RSVIMT_UC_VIEW_MAINT_GEN. When this program is executed, during the run time it asks for the View Name. When we provide the view name that needs to be made Unicode-compatible, it regenerates the View and adjusts everywhere where and all the View is used.
Case - 5:
Error : In Unicode, DESCRIBE LENGTH can only be used with the IN BYTE MODE or IN CHARACTER MODE addition.
Cause: In some cases, the syntax rules that apply to Unicode Programs are different than those for non-Unicode programs If you use the addition LENGTH, you must also use one of the two additions IN CHARACTER MODE or IN BYTE MODE in Unicode systems.
Solution: Specify the mode depending upon the field types.
... IN CHARACTER MODE
This addition can only be used for character-type fields and in combination with the addition
LENGTH. The length of the field f is determined in characters.
... IN BYTE MODE
This addition can only be used in combination with the addition LENGTH. The length of the field f is determined in bytes.
LEN TYPE I,
* DESCRIBE FIELD FLD LENGTH LEN.
DESCRIBE FIELD FLD LENGTH LEN IN CHARACTER MODE.
Case - 6:
Error: IN... MODE was expected.
Cause: In programs without active Unicode check, the file is opened for reading in binary mode, if you do not use any additions for OPENDATASET. But for those Unicode Enabled Programs, it is not possible and it requires the mode to be specified.
Solution: *In programs with active Unicode check, you must specify the access type (such as *... FOR INPUT, ... FOR OUTPUT, and so on) and the mode (such as ... IN TEXT MODE, ... IN BINARY MODE, and so on). If the file is opened using ... IN TEXT MODE, you must still use the addition ... ENCODING. If the Unicode check is enabled, it is possible to use file names containing blanks.
*OPEN DATASET FILENAME FOR INPUT.
OPEN DATASET FILENAME FOR INPUT IN TEXT MODE ENCODING DEFAULT.
Case - 7:
Error: Could not specify the access range automatically. This means that you need a RANGE addition.
Cause: Loops with the VARY or VARYING can cause problems in Unicode, because, on the one hand, you cannot be sure that you are accessing memory contents with the correct type and, on the other hand, memory could be inadvertently overwritten.
DO ... VARYING f FROM f1 NEXT f2 [RANGE f3].
WHILE ... VARY f FROM f1 NEXT f2 [RANGE f3].
With these statements, the fields f, f1 and f2 must be type-compatible with one another.
To avoid overwriting memory contents, a RANGE for valid accesses is implicitly or explicitly implemented for these statements.
Solution: If the RANGE f3 addition is specified, a syntax or runtime error is triggered, should f1 or f2 not be included in f3. For f3, only structures and elementary fields of the types C, N, or X are permitted.
If the RANGE addition is not specified, it is implicitly defined with FROM f1 NEXT f2 as follows:
- If the syntax check recognizes that both f1 and f2 are components of the same structure, the valid RANGE area is defined from the smallest structure containing f1 and f2.
- There is a syntax error if the syntax check recognizes that f1 and f2 are not part of the same structure.
- A valid range must be defined explicitly using RANGE if the syntax check does no recognize that f1 and f2are not together.
If a deep structure is defined as a RANGE addition, the system checks for every loop pass that there are no field references, object references, tables, or strings within the accessed range.
* DO 10 TIMES VARYING B FROM A+9(1) NEXT A+8(1).
DO 10 TIMES VARYING B FROM A+9(1) NEXT A+8(1) RANGE A+0(1).
Case - 8:
Error: One of the additions "FOR INPUT", "FOR OUTPUT", "FOR APPENDING" or "FOR UPDATE" was expected.
Cause: If OPEN DATASET is not executed in a Unicode Program, and if the user has write authorization for the file, the file is opened in read and write mode. Otherwise, it is only opened in read mode. But if the OPEN DATASET is executed in a Unicode Program, we need to mention in which mode it should open the dataset.
Solution: In the Unicode Environment, there are four modes in which the dataset can be opened.
... FOR INPUT
OPEN ... FOR INPUT opens the file in read mode.
... FOR OUTPUT
OPEN ... FOR OUTPUT opens the file in write mode.
If the file already exists, its existing content is deleted. If the file does not exist, the system creates it.
... FOR APPENDING
OPEN ... FOR APPENDING opens the file in append mode.
If the file already exists, its contents are retained, and the system moves to the end of the file. If the file does not exist, the system creates it. If the file was already open, the system moves to the end of the file.
... FOR UPDATE
OPEN ... FOR UPDATE opens the file in read and write mode.
Depending upon the requirement, the user should open the dataset in any of the four modes to make the program Unicode Compatible.
* OPEN DATASET P_DATASET IN TEXT MODE.
OPEN DATASET P_DATASET IN TEXT MODE ENCODING DEFAULT FOR INPUT.
Case - 9:
Error: In the Unicode context, TRANSLATE... CODEPAGE/NUMBER FORMAT is not allowed.
Cause: In the TRANSLATE statements, the additions FROM CODEPAGE and FROM NUMBER FORMAT are not allowed in a Unicode program. New conversion classes provided to replace these statements. Amongst other things, these classes are a (possible) replacement for the language elements TRANSLATE ... CODEPAGE and TRANSLATE ... NUMBER FORMAT ..., which may not be used in Unicode Programs.
Solution: Data that is not available in ABAP format (that is, text data that is not in the system code page format, or numeric data that is not in the byte order used on the application server), is stored in an X field or XSTRING in binary form.
- When converting to an ABAP format from another format, data is read from a byte sequence and written to an ABAP data object.
- When converting from an ABAP format to another format, data is read from an ABAP data object and written as a byte sequence.
The Classes available for this purpose are:
Converting other formats to ABAP data objects. (Reading a binary input stream).
Converting ABAP data objects to another format. (Writing to a binary output stream).
Converting data from one format to another. (Reading from a binary input stream and writing to a binary output stream).
DATA: conv TYPE REF TO cl_abap_conv_out_ce.
conv = cl_abap_conv_out_ce=>create(endian = 'B').
* TRANSLATE <INT> TO NUMBER FORMAT '0000'.
CALL METHOD conv->write( data = <INT> ).