2021-05-10

CONVERT for UTF-8

CONVERT function for clob input parameter in database with WE8MSWIN1252 character does not convert to UTF-8 - result is looking like UTF-16. Conversion to database character set works OK.
Test script

host chcp 65001

select CONVERT(CHR(246), 'AL32UTF8') from dual;
select CONVERT(TO_CLOB(CHR(246)), 'AL32UTF8') from dual;

select dump(to_char(CONVERT(TO_CLOB(CHR(246)), 'AL32UTF8'))) clob_convert, dump(CONVERT(CHR(246), 'AL32UTF8')) vc_convert 
from dual;
host chcp 1252 

select CONVERT(CHR(195) || CHR(182) , 'WE8MSWIN1252', 'AL32UTF8') from dual;
select CONVERT(TO_CLOB(CHR(195) || CHR(182)), 'WE8MSWIN1252', 'AL32UTF8') from dual;

Results

SQL> host chcp 65001
Active code page: 65001

SQL> select CONVERT(CHR(246), 'AL32UTF8') from dual;
ö
SQL> select CONVERT(TO_CLOB(CHR(246)), 'AL32UTF8') from dual;

SQL> select dump(to_char(CONVERT(TO_CLOB(CHR(246)), 'AL32UTF8', 'WE8MSWIN1252'))) clob_convert, dump(CONVERT(CHR(246), 'AL32UTF8')) vc_convert
  2  from dual;
Typ=1 Len=2: 0,246 <--- UTF_16????
Typ=1 Len=2: 195,182

SQL> host chcp 1252
Active code page: 1252

SQL> select CONVERT(CHR(195) || CHR(182) , 'WE8MSWIN1252', 'AL32UTF8') from dual;
ö

SQL> select CONVERT(TO_CLOB(CHR(195) || CHR(182)), 'WE8MSWIN1252', 'AL32UTF8') from dual;
ö

Quote from Oracle docs

Note:Oracle discourages the use of the CONVERT function in the current Oracle Database release. The return value of CONVERT has a character data type, so it should be either in the database character set or in the national character set, depending on the data type. Any dest_char_set that is not one of these two character sets is unsupported. The char argument and the source_char_set have the same requirements. Therefore, the only practical use of the function is to correct data that has been stored in a wrong character set.
Values that are in neither the database nor the national character set should be processed and stored as RAW or BLOB. Procedures in the PL/SQL packages UTL_RAW and UTL_I18N—for example, UTL_RAW.CONVERT—allow limited processing of such values. Procedures accepting a RAW argument in the packages UTL_FILE, UTL_TCP, UTL_HTTP, and UTL_SMTP can be used to output the processed data.

Комментариев нет: