Introduction
CL_ABAP_CONV_OBJ class is used to call directly the CCC converter, to convert bytes representing characters in a given codepage, into bytes representing characters in another given codepage, and it allows to use many special options.
Most of time, the following simpler classes are to be used (SAP only documents them):
- CL_ABAP_CONV_IN_CE: converts bytes representing characters in a given codepage into a character or string variable
- CL_ABAP_CONV_OUT_CE: converts a character or string variable into bytes representing characters in a given codepage
- CL_ABAP_CONV_X2X_CE: converts bytes representing characters in a given codepage, into bytes representing characters in another given codepage
For more information about CL_ABAP_CONV_OBJ, see SE24 documentation (here is a compilation: it was compiled from CL_ABAP_CONV_OBJ SE24 documentation and from documentation of parameters data elements
Convert a big stream of bytes, by making chunks
This code may be useful when there are both:
- a huge text file (2 giga bytes for example, uncompressed in ABAP from a ZIP file for example) which has to be converted from a variable number of character encoding bytes codepage (UTF-8 for example) into another codepage,
- unsufficent memory space.
The following code may be called to convert blocks of 10000 bytes at a time for example. As the last bytes may be cut (because of the variable number of encoding bytes), the INUSED parameter returns the consumed bytes: it might return 9999 instead of 10000 for example, if the last byte does not indicate a character.
It is provided as a routine and a demo program so that it is more easy to use:
- Routine:
FORM partial_xstring_char_conv USING i_xsequence TYPE xsequence i_incode TYPE cpcodepage i_outcode TYPE cpcodepage i_length TYPE i CHANGING c_xpos TYPE i e_outbuff TYPE any. " (you must make sure that c_xpos points to the first byte of " a character, otherwise the result will be wrong) STATICS so_conv_obj TYPE REF TO cl_abap_conv_obj. STATICS s_incode TYPE cpcodepage. STATICS s_outcode TYPE cpcodepage. DATA inused TYPE i. DATA i TYPE i. DATA l_xstring TYPE xstring. CLEAR e_outbuff. IF so_conv_obj IS NOT BOUND OR s_incode <> i_incode OR s_outcode <> i_outcode. CREATE OBJECT so_conv_obj EXPORTING incode = i_incode outcode = i_outcode broken = 'R'. "Conversion is canceled before any doubtful byte sequence s_incode = i_incode. s_outcode = i_outcode. ENDIF. i = XSTRLEN( i_xsequence ) - c_xpos. IF i > 0. IF i > i_length. i = i_length. ENDIF. l_xstring = i_xsequence+c_xpos(i). CALL METHOD so_conv_obj->convert EXPORTING inbuff = l_xstring outbufflg = 0 "use the whole length of outbuff IMPORTING outbuff = e_outbuff inused = inused. ADD inused TO c_xpos. ENDIF. ENDFORM.
- Demo program:
DATA l_xstring TYPE xstring. DATA l_string TYPE string. DATA l_string2 TYPE string. DATA l_pos TYPE i. * Below hex code is "En création, vérifier" in UTF-8 * (note that "é" is coded on 2 bytes C3 A9) l_xstring = '456E206372C3A96174696F6E2C2076C3A9726966696572'. DO. * Convert 6 next bytes from codepage 4110 (UTF-8) into system codepage 0000 (so that * to store the result in a character/string variable). * First loop, l_string will be "En cr", and l_pos will be 5 (though 6 bytes had * to be converted), because the 6th byte is the half character "é", so it can't be cut. * Second loop, l_string will be "éatio" and l_pos will be 11. * Third loop, l_string will be "n, vé" and l_pos will be 17. * Fourth loop, l_string will be "rifier" and l_pos will be 23. PERFORM partial_xstring_char_conv USING l_xstring '4110' '0000' 6 CHANGING l_pos l_string. IF l_string IS INITIAL. EXIT. ENDIF. CONCATENATE l_string2 l_string INTO l_string2. ENDDO. ASSERT l_string2 = 'En création, vérifier'.