Oracle 数据库从 gbk 更换到 utf8 数据库兼容性问题解决

2019年04月10日 12:07 · 阅读(2998) ·

[目录]

参考资料

彻底弄懂 Unicode 编码

字符编码笔记:ASCII,Unicode 和 UTF-8

UTF-8

字符编码

Oracle 字符集的查看和修改

Oracle Database Migration Assistant for Unicode

业务背景

之前数据库的编码是 gbk,为了业务的国际化,需要把编码修改为 utf8
由于 gbk 和 utf8 对应汉字的存储字节不同,所以需要对之前的字段做兼容性处理

Oracal数据库当前字符编码

查询sql:select userenv('language') from dual;

查询结果:SIMPLIFIED CHINESE_CHINA.ZHS16GBK

gbk 和 utf8 对应字节

编码类型 英文字节 汉字字节
gbk 1 2
utf-8 1 3
  1. string strTmp = "h";
  2. int ascii = System.Text.Encoding.ASCII.GetBytes(strTmp).Length;
  3. int unicode = System.Text.Encoding.Unicode.GetBytes(strTmp).Length;
  4. int gbk = System.Text.Encoding.Default.GetBytes(strTmp).Length;
  5. int utf8 = System.Text.Encoding.UTF8.GetBytes(strTmp).Length;
  6. Console.WriteLine("ascii-----:" + ascii);
  7. Console.WriteLine("unicode---:" + unicode);
  8. Console.WriteLine("gbk-------:"+ gbk);
  9. Console.WriteLine("utf8------:" + utf8);
  10. Console.ReadKey();

结果:

  1. ascii-----:1
  2. unicode---:2
  3. gbk-------:1
  4. utf8------:1
  1. string strTmp = "汉";
  2. int ascii = System.Text.Encoding.ASCII.GetBytes(strTmp).Length;
  3. int unicode = System.Text.Encoding.Unicode.GetBytes(strTmp).Length;
  4. int gbk = System.Text.Encoding.Default.GetBytes(strTmp).Length;
  5. int utf8 = System.Text.Encoding.UTF8.GetBytes(strTmp).Length;
  6. Console.WriteLine("ascii-----:" + ascii);
  7. Console.WriteLine("unicode---:" + unicode);
  8. Console.WriteLine("gbk-------:"+ gbk);
  9. Console.WriteLine("utf8------:" + utf8);
  10. Console.ReadKey();

结果:

  1. ascii-----:1
  2. unicode---:2
  3. gbk-------:2
  4. utf8------:3

修改思路

从下表可知,gbk 和 utf8 英文都只占用1个字节
gbk 1个汉字占用2个字节,utf8 占用3个,3/2=1.5

编码类型 英文字节 汉字字节
gbk 1 2
utf-8 1 3

如果之前的 gbk 编码类型长度为 255,修改为 utf8 后,对应类型长度应该为

255*1.5 = 382.5

为了更好的兼容,我们修改为之前长度的两倍 255*2

使用 oracle dmu 得到需要扩容表和字段

什么是 Oracle Database Migration Assistant for Unicode?

Oracle Database Migration Assistant for Unicode (DMU) 是一款独特的下一代迁移工具,提供将数据库从传统编码迁移到 Unicode 的端到端解决方案。通过在整个迁移流程中为 DBA 提供指导并实现许多迁移任务的自动化,DMU 直观的用户界面极大简化了迁移流程并降低了对字符集迁移专业知识的要求。它采用可扩展的就地迁移架构,与传统的导出和导入迁移方法相比,可显著减少数据转换所需的工作和停机时间。对于迁移后的数据库和已经使用 Unicode 字符集的现有数据库,DMU 还提供了一种验证模式,可识别未正确用 Unicode 编码的数据,从而对数据库应用中的 Unicode 实现的潜在问题进行健康检查。

从 2.1 版开始,DMU 与 Oracle GoldenGate 复制技术结合使用,支持近乎零停机时间的迁移模型。结合使用 DMU 与 GoldenGate 12.1.2.1.0 或更高版本,您可以设置这样一个迁移过程,利用 DMU 的数据准备和就地转换功能,利用 GoldenGate 复制迁移过程中生产系统上发生的增量数据更改,从而有效地消除停机时间窗口需求。

思路

功能大致如下
1.全库扫描表字段,字段类型,字段值,字符编码
2.把对应表的字符编码从 gbk 转换为 utf8,如果之前字段的长度能够存储对应内容,就通过
3.如果之前字段的长度不能存储对应内容,比如存不下了,就会在 system.dum$exceptions 表里面添加对应表,行,列信息,这个字段就是需要扩展的字段

辅助查询

查询表信息

  1. --查询表信息
  2. select * from dba_objects
  3. where upper(object_name) ='Student'

查询表对应的列信息

  1. --查询表对应的列信息
  2. select * from dba_tab_columns
  3. where TABLE_NAME='Student'

查询兼容的异常信息

  1. --查询兼容的异常信息
  2. select decode(sde.type,
  3. 2,
  4. 'exceed column',
  5. 4,
  6. 'exceed datatype',
  7. 8,
  8. 'invalid') as "DMU_TYPE",
  9. sde.row_id, --表行Id
  10. sde.obj#, --表的Id
  11. sde.intcol# --表列Id
  12. from system.dum$exceptions sde
  13. where sde.obj# = 1301018 --表对象Id

获取表对应行,列对应的值方法

  1. create or replace function dmu_data_detail_f(p_object_id in number,
  2. p_column_id in number,
  3. p_rowid in varchar2)
  4. return varchar2 is
  5. l_data_detail varchar2(4000);
  6. l_data_query_sql VARCHAR2(2000);
  7. l_column_name VARCHAR2(128);
  8. l_table_name VARCHAR2(128);
  9. l_owner_name VARCHAR2(128);
  10. begin
  11. select dtc.column_name
  12. into l_column_name
  13. from dba_objects do, dba_tab_columns dtc
  14. where do.object_name = dtc.TABLE_NAME
  15. and do.OWNER = dtc.OWNER
  16. and dtc.COLUMN_ID = p_column_id
  17. and do.object_id = p_object_id;
  18. select object_name, owner
  19. into l_table_name, l_owner_name
  20. from dba_objects
  21. where object_id = p_object_id;
  22. l_data_query_sql := 'select ' || l_column_name || ' from ' ||
  23. l_owner_name || '.' || l_table_name ||
  24. ' where rowid=''' || p_rowid || '''';
  25. --dbms_output.put_line(l_data_query_sql);
  26. execute immediate l_data_query_sql
  27. into l_data_detail;
  28. return(l_data_detail);
  29. end dmu_data_detail_f;

查询对应表不兼容的字段-记录数少时使用

  1. --当查询结果记录数少时使用
  2. select owner as "用户",
  3. object_name as "对象名",
  4. object_type as "对象类型",
  5. COLUMN_NAME as "数据列",
  6. DATA_TYPE as "数据类型",
  7. DMU_TYPE as "扩展类型",
  8. dmu_data_detail_f(obj#, intcol#, row_id) as "例外数据"
  9. from (select do.owner,
  10. do.object_name,
  11. do.object_type,
  12. dto.COLUMN_NAME,
  13. dto.DATA_TYPE || '(' || dto.DATA_LENGTH || ')' as "DATA_TYPE",
  14. decode(sde.type,
  15. 2,
  16. 'exceed column',
  17. 4,
  18. 'exceed datatype',
  19. 8,
  20. 'invalid') as "DMU_TYPE",
  21. sde.row_id,
  22. sde.obj#,
  23. sde.intcol#
  24. from system.dum$exceptions sde,
  25. dba_objects do,
  26. dba_tab_columns dto
  27. where sde.obj# = do.object_id
  28. and sde.intcol# = dto.COLUMN_ID
  29. and do.object_name = dto.TABLE_NAME
  30. and do.OWNER = dto.OWNER
  31. and upper(do.object_name) = upper('&p_table_name')
  32. --and sde.type = &p_dmu_type
  33. order by OWNER, object_name, COLUMN_NAME, TYPE);

查询对应表不兼容的字段-数据量大时使用

  1. select '用户.表','字段名称',字段名称 ,255 data_length,cux_utf8_pub.count_byte_as_utf8(字段名称),rowid row_id
  2. from 用户.表 t where exists (
  3. select null
  4. from system.dum$exceptions sde,
  5. dba_objects do,
  6. dba_tab_cols dto
  7. where sde.obj# = do.object_id
  8. and sde.intcol# = dto.internal_column_id
  9. and do.object_name = dto.table_name
  10. and do.owner = dto.owner
  11. and do.owner = '用户'
  12. and dto.column_name = '字段名称'
  13. and upper(do.object_name) = '表名称'
  14. and sde.row_id = t.rowid
  15. );

修改 sql

从上面得到对应的表和不兼容的字段,对应修改该字段的字段长度或类型即可,例如

  1. -- =============================================
  2. -- Author: luoma
  3. -- Create date: 20194915:48:25
  4. -- Description: 字符编码从 SIMPLIFIED CHINESE_CHINA.ZHS16GBK 修改为 utf-8,所以对应字段要进行扩容
  5. -- gbkutf8 英文都占用1个字符,gbk中文占用2个字符,utf8占用3个,所以对应字段需要扩容为之前的1.5倍,这里我们修改为2
  6. -- =============================================
  7. alter table Student
  8. modify Name varchar2(383);

字符串截取

测试sql

  1. -- =============================================
  2. -- Author: luoma
  3. -- Create date: 201941110:50:24
  4. -- Description: utf8字符集截取对应表超过指定utf8编码长度的字符串
  5. -- =============================================
  6. declare
  7. l_ret_var varchar2(32767);--存储需要截取的字符串
  8. l_var varchar2(3); --存储截取的单个字符串
  9. l_len number := 0; --用于已经截取的utf8字符总长度
  10. l_dump number := 0; --存储单个utf8字符长度
  11. p_lengthb number := 25; --需要截取的字符长度
  12. begin
  13. --获取超过指定用UTF8字符集计算字符串字节数的数据
  14. for item in (select no,name from test_v_hwhao1 where (nvl(lengthb(name), 0) - nvl(length(name), 0)) + nvl(lengthb(name), 0) > p_lengthb )
  15. loop
  16. --初始化变量
  17. l_len := 0;
  18. l_ret_var := '';
  19. --处理超过指定长度的字段
  20. for i in 1 .. length(item.name)
  21. loop
  22. --截取单个字符串
  23. l_var := substr(item.name,i,1);
  24. --因为之前使用 gbk 编码,所以一个中文占用2个字符
  25. if (lengthb(l_var) = 2) then
  26. --utf8 一个中文字符长度为3
  27. l_dump := 3;
  28. else
  29. --utf8 其它字符长度为1
  30. l_dump := 1;
  31. end if;
  32. --已经总共统计了多少个 utf8 字符长度
  33. l_len := l_len + l_dump;
  34. --如果截取的长度还未达到需要截取的长度
  35. if(l_len <= p_lengthb) then
  36. --继续添加
  37. l_ret_var := l_ret_var || l_var;
  38. end if;
  39. --如果截取的长度达到了需要截取的长度
  40. if(l_len >= p_lengthb) then
  41. --把截取后的长度修改到表字段
  42. update test_v_hwhao1
  43. set name = l_ret_var
  44. where no = item.no;
  45. --提交修改
  46. commit;
  47. --退出循环
  48. exit;
  49. end if;
  50. end loop i;
  51. end loop;
  52. end;

测试表

测试完成后,结果应该为下图中红圈部分

select * from test_v_hwhao1 order by no

测试前

测试后

cux_utf8_pub

  1. create or replace package body cux_utf8_pub is
  2. --用UTF8字符集计算字符串字节
  3. function count_byte_as_utf8(p_text in varchar2) return number is
  4. l_char_lenth number := 0;
  5. l_byte_lenth number := 0;
  6. begin
  7. l_char_lenth := nvl(length(p_text), 0);
  8. l_byte_lenth := nvl(lengthb(p_text), 0);
  9. return(l_byte_lenth - l_char_lenth) + l_byte_lenth;
  10. end count_byte_as_utf8;
  11. --用UTF8字符集截取字符串,确保最后一位不是乱码
  12. function substrb_use_utf8(p_text in varchar2, p_lengthb in number)
  13. return varchar2 is
  14. l_ret_var varchar2(32767);
  15. l_var varchar2(3);
  16. l_len number := 0;
  17. l_dump number := 0;
  18. begin
  19. if (count_byte_as_utf8(p_text) <= p_lengthb) then
  20. return p_text;
  21. end if;
  22. for i in 1 .. length(p_text) loop
  23. l_var := substr(p_text, i, 1);
  24. if (lengthb(l_var) = 2) then
  25. l_dump := 3;
  26. else
  27. l_dump := 1;
  28. end if;
  29. l_len := l_len + l_dump;
  30. if (l_len <= p_lengthb) then
  31. l_ret_var := l_ret_var || l_var;
  32. end if;
  33. if (l_len >= p_lengthb) then
  34. return l_ret_var;
  35. end if;
  36. end loop i;
  37. return l_ret_var;
  38. end substrb_use_utf8;
  39. --用GBK字符集截取字符串,确保最后一位不是乱码
  40. function substrb_use_gbk(p_text in varchar2, p_lengthb in number)
  41. return varchar2 is
  42. l_last_string1 varchar2(4000);
  43. l_last_string2 varchar2(4000);
  44. begin
  45. if (lengthb(p_text) <= p_lengthb) then
  46. return p_text;
  47. end if;
  48. --截取逻辑
  49. if (length(substrb(p_text, p_lengthb, 2)) = 1) then
  50. --截取少一位
  51. return substrb(p_text, 1, p_lengthb - 1);
  52. else
  53. return substrb(p_text, 1, p_lengthb);
  54. end if;
  55. end substrb_use_gbk;
  56. procedure execute_substrb(p_owner in varchar2,
  57. p_table_name in varchar2,
  58. p_column_name in varchar2,
  59. p_debug_flag in varchar2 default 'Y') is
  60. l_bakup_sql varchar2(4000);
  61. l_datafix_sql varchar2(4000);
  62. l_count_length_sql varchar2(4000);
  63. l_extend_sql varchar2(4000);
  64. g_max_length number := 4000;
  65. l_length number;
  66. begin
  67. --截断处理
  68. for cur in (select t.rowid row_id, t.*, c.data_length
  69. from cux_utf8_substr_list t, dba_tab_cols c
  70. where t.table_name = c.table_name
  71. and t.owner = c.owner
  72. and t.column_name = c.column_name
  73. and c.data_type = 'VARCHAR2'
  74. and t.owner = nvl(p_owner, t.owner)
  75. and t.column_name = nvl(p_column_name, t.column_name)
  76. and t.table_name = nvl(p_table_name, t.table_name)
  77. and convert_type = 'CUT') loop
  78. l_bakup_sql := 'insert into cux_utf8_substr_bak
  79. (table_name, owner, column_name, bak_data, bak_date, data_row_id)
  80. select ''' || cur.table_name || ''',''' ||
  81. cur.owner || ''',''' || cur.column_name || ''',' ||
  82. cur.column_name || ',sysdate, t.rowid row_id from ' ||
  83. cur.owner || '.' || cur.table_name ||
  84. ' t where exists (select null
  85. from system.dum$exceptions sde,
  86. dba_objects do,
  87. dba_tab_cols dto
  88. where sde.obj# = do.object_id
  89. and sde.intcol# = dto.internal_column_id
  90. and do.object_name = dto.table_name
  91. and do.owner = dto.owner
  92. and do.owner = upper(''' || cur.owner || ''')
  93. and dto.column_name = upper(''' ||
  94. cur.column_name || ''')
  95. and upper(do.object_name) = upper(''' ||
  96. cur.table_name || ''')
  97. and sde.row_id = t.rowid)';
  98. --dbms_output.put_line(l_sql);
  99. /*insert into cux_utf8_substr_bak
  100. (table_name, owner, column_name, bak_data, bak_date)
  101. select cur.table_name,
  102. cur.owner,
  103. cur.column_name,
  104. dmu_data_detail_f(obj#, intcol#, row_id),
  105. sysdate
  106. from system.dum$exceptions sde,
  107. dba_objects do,
  108. dba_tab_cols dto
  109. where sde.obj# = do.object_id
  110. and sde.intcol# = dto.internal_column_id
  111. and do.object_name = dto.table_name
  112. and do.owner = dto.owner
  113. and do.owner = upper(cur.owner)
  114. and dto.column_name = upper(cur.column_name)
  115. and upper(do.object_name) = upper(cur.table_name);*/
  116. l_datafix_sql := 'update ' || cur.owner || '.' || cur.table_name ||
  117. ' t set ' || cur.column_name ||
  118. '=cux_utf8_pub.substrb_use_utf8(' || cur.column_name || ',' ||
  119. cur.data_length ||
  120. ') where exists (select null
  121. from system.dum$exceptions sde,
  122. dba_objects do,
  123. dba_tab_cols dto
  124. where sde.obj# = do.object_id
  125. and sde.intcol# = dto.internal_column_id
  126. and do.object_name = dto.table_name
  127. and do.owner = dto.owner
  128. and do.owner = upper(''' || cur.owner || ''')
  129. and dto.column_name = upper(''' ||
  130. cur.column_name || ''')
  131. and upper(do.object_name) = upper(''' ||
  132. cur.table_name || ''')
  133. and sde.row_id = t.rowid)';
  134. if (p_debug_flag = 'N') THEN
  135. update cux_utf8_substr_list t
  136. set t.process_flag = 'P',
  137. t.process_date_start = sysdate,
  138. t.datafix_sql = l_datafix_sql,
  139. t.bakup_sql = l_bakup_sql
  140. where rowid = cur.row_id;
  141. execute immediate l_bakup_sql;
  142. execute immediate l_datafix_sql;
  143. update cux_utf8_substr_list t
  144. set t.process_flag = 'S', t.process_date_end = sysdate
  145. where rowid = cur.row_id;
  146. commit;
  147. else
  148. dbms_output.put_line('bakup:');
  149. dbms_output.put_line(l_bakup_sql);
  150. dbms_output.put_line('');
  151. dbms_output.put_line('datafix:');
  152. dbms_output.put_line(l_datafix_sql);
  153. end if;
  154. end loop cur;
  155. --扩展处理
  156. for cur in (select t.rowid row_id, t.*, c.data_length
  157. from cux_utf8_substr_list t, dba_tab_cols c
  158. where t.table_name = c.table_name
  159. and t.owner = c.owner
  160. and t.column_name = c.column_name
  161. and c.data_type = 'VARCHAR2'
  162. and t.owner = nvl(p_owner, t.owner)
  163. and t.column_name = nvl(p_column_name, t.column_name)
  164. and t.table_name = nvl(p_table_name, t.table_name)
  165. and convert_type = 'EXT') loop
  166. l_count_length_sql := 'SELECT max(cux_utf8_pub.count_byte_as_utf8(' ||
  167. cur.column_name || ')) from ' || cur.owner || '.' ||
  168. cur.table_name || ' t' ||
  169. ' where exists (select null
  170. from system.dum$exceptions sde,
  171. dba_objects do,
  172. dba_tab_cols dto
  173. where sde.obj# = do.object_id
  174. and sde.intcol# = dto.internal_column_id
  175. and do.object_name = dto.table_name
  176. and do.owner = dto.owner
  177. and do.owner = upper(''' || cur.owner || ''')
  178. and dto.column_name = upper(''' ||
  179. cur.column_name || ''')
  180. and upper(do.object_name) = upper(''' ||
  181. cur.table_name || ''')
  182. and sde.row_id = t.rowid)';
  183. execute immediate l_count_length_sql
  184. into l_length;
  185. if (l_length > g_max_length) then
  186. dbms_output.put_line(cur.owner || '.' || cur.table_name || '.' ||
  187. cur.column_name || '扩展后的长度:' || l_length ||
  188. '超过4000');
  189. update cux_utf8_substr_list t
  190. set t.process_flag = 'E', t.process_date_start = sysdate
  191. where rowid = cur.row_id;
  192. commit;
  193. continue;
  194. end if;
  195. l_extend_sql := 'alter table ' || cur.owner || '.' || cur.table_name ||
  196. ' modify ' || cur.column_name || ' VARCHAR2(' ||
  197. l_length || ')';
  198. l_bakup_sql := 'alter table ' || cur.owner || '.' || cur.table_name ||
  199. ' modify ' || cur.column_name || ' VARCHAR2(' ||
  200. cur.data_length || ')';
  201. if (p_debug_flag = 'N') THEN
  202. update cux_utf8_substr_list t
  203. set t.process_flag = 'P',
  204. t.process_date_start = sysdate,
  205. t.extend_sql = l_extend_sql,
  206. t.bakup_sql = l_bakup_sql
  207. where rowid = cur.row_id;
  208. execute immediate l_extend_sql;
  209. update cux_utf8_substr_list t
  210. set t.process_flag = 'S', t.process_date_end = sysdate
  211. where rowid = cur.row_id;
  212. commit;
  213. else
  214. dbms_output.put_line('extend sql:');
  215. dbms_output.put_line(l_extend_sql);
  216. dbms_output.put_line('bakup_sql:');
  217. dbms_output.put_line(l_bakup_sql);
  218. end if;
  219. end loop cur;
  220. end execute_substrb;
  221. procedure get_quick_query_sql(p_owner in varchar2 default null,
  222. p_table_name in varchar2,
  223. p_column_name in varchar2 default null) is
  224. l_sql varchar2(4000);
  225. begin
  226. for cur in (select dto.data_length,
  227. do.object_name,
  228. do.owner,
  229. dto.column_name,
  230. do.object_id,
  231. dto.internal_column_id
  232. from dba_objects do,
  233. dba_tab_cols dto,
  234. cux_utf8_error_info cu
  235. where do.object_name = dto.table_name
  236. and do.owner = dto.owner
  237. and do.owner = cu.owner
  238. and dto.column_name = cu.column_name
  239. and do.object_name = cu.table_name
  240. and cu.owner = nvl(upper(p_owner), cu.owner)
  241. and cu.table_name = upper(p_table_name)
  242. and cu.column_name =
  243. nvl(upper(p_column_name), cu.column_name)) loop
  244. l_sql := 'select ''' || cur.owner || '.' || cur.object_name ||
  245. ''',''' || cur.column_name || ''',' || cur.column_name || ' ,' ||
  246. cur.data_length ||
  247. ' data_length,cux_utf8_pub.count_byte_as_utf8(' ||
  248. cur.column_name || '),rowid row_id from ' || cur.owner || '.' ||
  249. cur.object_name || ' t where exists (select null
  250. from system.dum$exceptions sde,
  251. dba_objects do,
  252. dba_tab_cols dto
  253. where sde.obj# = do.object_id
  254. and sde.intcol# = dto.internal_column_id
  255. and do.object_name = dto.table_name
  256. and do.owner = dto.owner
  257. and do.owner = ''' || cur.owner || '''
  258. and dto.column_name = ''' || cur.column_name || '''
  259. and upper(do.object_name) = ''' || cur.object_name || '''
  260. and sde.row_id = t.rowid);';
  261. dbms_output.put_line(l_sql);
  262. dbms_output.put_line('');
  263. end loop cur;
  264. end get_quick_query_sql;
  265. end cux_utf8_pub;