MySQL Load data多种使用方法
一、MySQL LOAD 基本背景
我们在数据库运维过程中难免会涉及到需要对文本数据进行处理,并导入到数据库中,本文整理了一些导入导出时常见的场景进行示例演示。
提示:
演示环境MySQL版本的
mysql Ver 14.14 Distrib 5.7.32, for linux-glibc2.12 (x86_64) using EditLine wrapper
二、MySQL LOAD 基础参数
文章后续示例均使用以下命令导出的 csv 格式样例数据(以 , 逗号做分隔符,以 " 双引号作为界定符)
测试数据表结构如下:
Create Table: CREATE TABLE `t_menu` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=202 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci1 row in set (0.00 sec)
-- 导出基础参数
load data infile '/data/mysql/tmp/b_menu.txt' replace into table `menu.tmp` character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n';localhost "mgr01" 10:52:02 test01>select * into outfile '/data/mysql/tmp/b_menu.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from test01.b_menu limit 10;Query OK, 10 rows affected (0.00 sec)
[root@test ~]# cat /data/mysql/tmp/b_menu.txt "1","核心数据指标","30","2","/index",\N,"1","2019-06-19 19:58:10","2019-10-31 20:27:37","1""2","拍机数据","29","2","/auction-dashboard",\N,"1","2019-06-19 19:58:24","2019-10-24 20:21:36","1""3","产品滞留数据","31","2","/product-dashboard",\N,"1","2019-06-19 19:58:42","2019-10-24 20:21:36","1""4","发货数据","42","3","/product-data",\N,"1","2019-08-29 17:44:35","2019-11-18 17:22:29","1""6","退租数据","14","2","/tuizushuju","","3","2019-09-25 19:05:47","2019-11-18 17:23:40","1""7","呆滞数据","14","2","/daizhishuju","","2","2019-09-25 19:12:29","2019-11-18 17:23:40","1""10","发货数据明细","14","2","/shujumingxi","","4","2019-09-25 19:15:37","2019-11-18 17:23:40","1""12","增率统计","32","3","/branch-dashboard",\N,"1","2019-09-26 21:23:16","2020-01-15 21:03:38","1""13","增率详细","32","3","/customer-dashboard",\N,"2","2019-09-26 21:23:46","2020-01-15 21:03:38","1""14","产品部数据","0","1","/svn7kezaqe9","","5","2019-09-29 21:58:09","2020-07-28 21:18:50","0"
创建测试临时表 menu.tmp:
CREATE TABLE `menu.tmp` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;localhost "mgr01" 10:59:07 test01>load data infile '/data/mysql/tmp/b_menu.txt' replace into table `menu.tmp` character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n';Query OK, 10 rows affected (0.03 sec)Records: 10 Deleted: 0 Skipped: 0 Warnings: 0localhost "mgr01" 10:59:17 test01>localhost "mgr01" 11:00:28 test01>select * from `menu.tmp`;+----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+| id | name | parent_id | level | url | icon | order | create_time | update_time | menu_type |+----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+| 1 | 核心数指标 | 30 | 2 | /index | NULL | 1 | 2019-06-19 19:58:10 | 2019-10-31 20:27:37 | 1 || 2 | 易机数据 | 29 | 2 | /auction-dashboard | NULL | 1 | 2019-06-19 19:58:24 | 2019-10-24 20:21:36 | 1 || 3 | 产品滞留数据 | 31 | 2 | /product-dashboard | NULL | 1 | 2019-06-19 19:58:42 | 2019-10-24 20:21:36 | 1 || 4 | 发货数据 | 42 | 3 | /product-data | NULL | 1 | 2019-08-29 17:44:35 | 2019-11-18 17:22:29 | 1 || 6 | 退数据 | 14 | 2 | /tuizushuju | | 3 | 2019-09-25 19:05:47 | 2019-11-18 17:23:40 | 1 || 7 | 数据 | 14 | 2 | /daizhishuju | | 2 | 2019-09-25 19:12:29 | 2019-11-18 17:23:40 | 1 || 10 | 数据明细 | 14 | 2 | /shujumingxi | | 4 | 2019-09-25 19:15:37 | 2019-11-18 17:23:40 | 1 || 12 | 租率统计 | 32 | 3 | /branch-dashboard | NULL | 1 | 2019-09-26 21:23:16 | 2020-01-15 21:03:38 | 1 || 13 | 租率详细 | 32 | 3 | /customer-dashboard | NULL | 2 | 2019-09-26 21:23:46 | 2020-01-15 21:03:38 | 1 || 14 | 产品部数据 | 0 | 1 | /svn7kezaqe9 | | 5 | 2019-09-29 21:58:09 | 2020-07-28 21:18:50 | 0 |+----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+10 rows in set (0.00 sec)
三、LOAD 场景示例
场景 1. LOAD 文件中的字段比数据表中的字段多
只需要文本文件中部分数据导入到数据表中
临时创建2个字段的表结构:
localhost "mgr01" 11:09:48 test01>create table menu_tmp01 as select id,name,level,url from `menu.tmp`;ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.localhost "mgr01" 11:00:38 test01>create table `menu.tmp01` select id,name,level,url from `menu.tmp`;ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
原因是MySQL开启了Gtid,导致的:
一般mysql5.7以前版本是支持create table XXX as select * from XXX; 这种创建表的语法,但是MySQL5.7.x版本里面gtid是开启的,会报错
ERROR 1786 (HY000):Statement violates GTID consistency: CREATE TABLE ... SELECT.
官方说明:https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-restrictions.html
有2种方式关闭MySQL的开启的Gtid:
第一种 直接修改MySQL的my.cnf的配置文件,重启MySQL服务:
gtid_mode = off
enforce_gtid_consistency = 0
第二种方式就是在线滚动修改参数:
尝试在线动态修改时的报错:
localhost "mgr01" 11:15:36 test01>SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = off;ERROR 1779 (HY000): GTID_MODE = ON requires ENFORCE_GTID_CONSISTENCY = ON.localhost "mgr01" 11:16:49 test01> set global GTID_MODE = off;ERROR 1788 (HY000): The value of @@GLOBAL.GTID_MODE can only be changed one step at a time: OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON. Also note that this value must be stepped up or down simultaneously on all servers. See the Manual for instructions.
上面提示如果当前值为ON,要设置为OFF,则先设置为GTID_MODE=ON_PERMISSIVE,再设置GTID_MODE=OFF_PERMISSIVE,再设置GTID_MODE = off,如果将OFF设置为ON,则反过来设置即可。
继续设置:
localhost "mgr01" 11:25:51 test01>set @@GLOBAL.GTID_MODE=ON_PERMISSIVE;Query OK, 0 rows affected (0.03 sec)localhost "mgr01" 11:25:52 test01>set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE;Query OK, 0 rows affected (0.01 sec)
如果set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE;报错时,一般是如下报错:
mysql> set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE;ERROR 1766 (HY000): The system variable gtid_mode cannot be set when there is an ongoing transaction.
上面报错,当有正在进行的事务时,不能设置,所以就COMMIT一下:
localhost "mgr01" 11:26:00 test01>commit;Query OK, 0 rows affected (0.00 sec)localhost "mgr01" 11:27:48 test01>set @@GLOBAL.GTID_MODE=OFF_PERMISSIVE;Query OK, 0 rows affected (0.00 sec)localhost "mgr01" 11:28:01 test01>set @@GLOBAL.GTID_MODE=OFF;Query OK, 0 rows affected (0.02 sec)localhost "mgr01" 11:28:19 test01> show variables like 'GTID_MODE';+---------------+-------+| Variable_name | Value |+---------------+-------+| gtid_mode | OFF |+---------------+-------+1 row in set (0.00 sec)
然后再设置SET GLOBAL ENFORCE_GTID_CONSISTENCY = off:
localhost "mgr01" 11:29:03 test01>show variables like 'ENFORCE_GTID_CONSISTENCY';+--------------------------+-------+| Variable_name | Value |+--------------------------+-------+| enforce_gtid_consistency | OFF |+--------------------------+-------+**到此时在线把Gtid关闭掉了。**
再次执行
create table menu_tmp01 as select id,name,level,url from menu.tmp
;
create table menu_tmp02 select id,name,level,url from menu.tmp
;
localhost "mgr01" 11:29:17 test01>create table menu_tmp01 as select id,name,level,url from `menu.tmp`;Query OK, 10 rows affected (0.04 sec)Records: 10 Duplicates: 0 Warnings: 0localhost "mgr01" 11:30:10 test01>desc menu_tmp01;+-------+--------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+--------------+------+-----+---------+-------+| id | int(11) | NO | | 0 | || name | varchar(255) | YES | | NULL | || level | int(11) | YES | | 1 | || url | varchar(255) | YES | | NULL | |+-------+--------------+------+-----+---------+-------+4 rows in set (0.00 sec)localhost "mgr01" 11:30:20 test01>create table menu_tmp02 select id,name,level,url from `menu.tmp`;Query OK, 10 rows affected (0.04 sec)Records: 10 Duplicates: 0 Warnings: 0localhost "mgr01" 11:30:45 test01>desc menu_tmp02;+-------+--------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+--------------+------+-----+---------+-------+| id | int(11) | NO | | 0 | || name | varchar(255) | YES | | NULL | || level | int(11) | YES | | 1 | || url | varchar(255) | YES | | NULL | |+-------+--------------+------+-----+---------+-------+4 rows in set (0.00 sec)localhost "mgr01" 11:30:50 test01>
接着 场景1.LOAD 文件中的字段比数据表中的字段多 ,把只需要文本文件中部分数据导入到数据表中演示
-- 导入数据语句
load data infile '/data/mysql/tmp/b_menu.txt' replace into table test01.menu_tmp01 character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) set id=@C1, name=@C2, level=@C4, url=@C5;
导入数据:
load data infile '/data/mysql/tmp/b_menu.txt'replace into table test01.menu_tmp01character set utf8mb4fields terminated by ','enclosed by '"'lines terminated by '\n'(@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应b_menu.txt件中10列数据-- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果set id=@C1, name=@C2, level=@C4, url=@C5;localhost "mgr01" 11:46:19 test01>load data infile '/data/mysql/tmp/b_menu.txt' -> replace into table test01.menu_tmp01 -> character set utf8mb4 -> fields terminated by ',' -> enclosed by '"' -> lines terminated by '\n' -> (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应b_menu.txt件中10列数据 -> -- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果 -> set id=@C1, -> name=@C2, -> level=@C4, -> url=@C5;Query OK, 10 rows affected (0.01 sec)Records: 10 Deleted: 0 Skipped: 0 Warnings: 0localhost "mgr01" 11:46:26 test01>select * from menu_tmp01;+----+--------------------+-------+---------------------+| id | name | level | url |+----+--------------------+-------+---------------------+| 1 | 核心数据指标 | 2 | /index || 2 | 易机数据 | 2 | /auction-dashboard || 3 | 产品滞留数据 | 2 | /product-dashboard || 4 | 发货数据 | 3 | /product-data || 6 | 退租数据 | 2 | /tuizushuju || 7 | 呆滞数据 | 2 | /daizhishuju || 10 | 发货数据明细 | 2 | /shujumingxi || 12 | 增率统计 | 3 | /branch-dashboard || 13 | 增率详细 | 3 | /customer-dashboard || 14 | 产品部数据 | 1 | /svn7kezaqe9 |+----+--------------------+-------+---------------------+10 rows in set (0.00 sec)
场景 2. LOAD 文件中的字段比数据表中的字段少
说明:表字段不仅包含文本文件中所有数据,还包含了额外的字段
导出部分MySQL表test01.b_menu部分字段的数据到文本文件:
select id,name,url,create_time into outfile '/data/mysql/tmp/c_menu.txt' character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' from test01.b_menu limit 10;[root@test tmp]# cat /data/mysql/tmp/c_menu.txt"1","核心数据指标","/index","2019-06-19 19:58:10""2","易机数据","/auction-dashboard","2019-06-19 19:58:24""3","产品滞留数据","/product-dashboard","2019-06-19 19:58:42""4","发货数据","/product-data","2019-08-29 17:44:35""6","退租数据","/tuizushuju","2019-09-25 19:05:47""7","呆滞数据","/daizhishuju","2019-09-25 19:12:29""10","发货数据明细","/shujumingxi","2019-09-25 19:15:37""12","增率统计","/branch-dashboard","2019-09-26 21:23:16""13","增率详细","/customer-dashboard","2019-09-26 21:23:46""14","产品部数据","/svn7kezaqe9","2019-09-29 21:58:09"
创建测试表a_menu:
CREATE TABLE `a_menu` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单名称', `parent_id` int(11) DEFAULT '0' COMMENT '父菜单id', `level` int(11) DEFAULT '1' COMMENT '菜单等级,从1开始', `url` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单链接', `icon` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '菜单图标', `order` int(11) DEFAULT NULL COMMENT '同级菜单顺序', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `menu_type` int(3) DEFAULT '2' COMMENT '菜单类型:0:目录,1:页面,2:不区分(兼容老数据)', PRIMARY KEY (`id`), UNIQUE KEY `unique_menu_name_level_parent_id` (`name`,`level`,`parent_id`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
load data infile '/data/mysql/tmp/c_menu.txt'replace into table test01.a_menucharacter set utf8mb4fields terminated by ','enclosed by '"'lines terminated by '\n'(@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) -- 该部分对应test01.a_menu表中10列字段-- 只对导出数据中指定的4个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果. a_menu表中多出的字段不做处理,这些表字段以设定的默认值和null来处理set id=@C1, name=@C2, url=@C3, create_time=@C4; -- 此行set后面的这些@C1 @C2 @C3 @C4 指的是导出文件/data/mysql/tmp/c_menu.txt中的前后顺序的4列数值。
下面的sql才是正确的姿势:
localhost "mgr01" 12:50:02 (none)>load data infile '/data/mysql/tmp/c_menu.txt' replace into table test01.a_menu character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n' (@C1,@C2,@C3,@C4,@C5,@C6,@7,@8,@9,@10) set id=@C1, name=@C2, url=@C3, create_time=@C4;Query OK, 10 rows affected (0.02 sec)Records: 10 Deleted: 0 Skipped: 0 Warnings: 0localhost "mgr01" 12:50:23 (none)>select * from test01.a_menu;+----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+| id | name | parent_id | level | url | icon | order | create_time | update_time | menu_type |+----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+| 1 | 核心数据指标 | 0 | 1 | /index | NULL | NULL | 2019-06-19 19:58:10 | 2021-03-27 12:50:23 | 2 || 2 | 易机数据 | 0 | 1 | /auction-dashboard | NULL | NULL | 2019-06-19 19:58:24 | 2021-03-27 12:50:23 | 2 || 3 | 产品滞留数据 | 0 | 1 | /product-dashboard | NULL | NULL | 2019-06-19 19:58:42 | 2021-03-27 12:50:23 | 2 || 4 | 发货数据 | 0 | 1 | /product-data | NULL | NULL | 2019-08-29 17:44:35 | 2021-03-27 12:50:23 | 2 || 6 | 退租数据 | 0 | 1 | /tuizushuju | NULL | NULL | 2019-09-25 19:05:47 | 2021-03-27 12:50:23 | 2 || 7 | 呆滞数据 | 0 | 1 | /daizhishuju | NULL | NULL | 2019-09-25 19:12:29 | 2021-03-27 12:50:23 | 2 || 10 | 发货数据明细 | 0 | 1 | /shujumingxi | NULL | NULL | 2019-09-25 19:15:37 | 2021-03-27 12:50:23 | 2 || 12 | 增率统计 | 0 | 1 | /branch-dashboard | NULL | NULL | 2019-09-26 21:23:16 | 2021-03-27 12:50:23 | 2 || 13 | 增率详细 | 0 | 1 | /customer-dashboard | NULL | NULL | 2019-09-26 21:23:46 | 2021-03-27 12:50:23 | 2 || 14 | 产品部数据 | 0 | 1 | /svn7kezaqe9 | NULL | NULL | 2019-09-29 21:58:09 | 2021-03-27 12:50:23 | 2 |+----+--------------------+-----------+-------+---------------------+------+-------+---------------------+---------------------+-----------+10 rows in set (0.00 sec)
场景 3. LOAD 生成自定义字段数据:
从场景 2 的验证可以看到,emp 表中新增的字段 fullname,modify_date,delete_flag 字段在导入时并未做处理,被置为了 NULL 值,如果需要对其进行处理,可在 LOAD 时通过 MySQL支持的函数 或给定 固定值 自行定义数据,对于文件中存在的字段也可做函数处理,结合导入导出,实现简单的 ETL 功能,如下所示:
-- 导入数据语句
load data infile '/data/mysql/3306/tmp/employees.txt'replace into table demo.empcharacter set utf8mb4fields terminated by ','enclosed by '"'lines terminated by '\n'(@C1,@C2,@C3,@C4,@C5,@C6) -- 该部分对应employees.txt文件中6列数据
-- 以下部分明确对表中字段与数据文件中的字段做Mapping关系,不存在的数据通过函数处理生成(也可设置为固定值)
set emp_no=@C1, birth_date=@C2, first_name=upper(@C3), -- 将导入的数据转为大写 last_name=lower(@C4), -- 将导入的数据转为小写 fullname=concat(first_name,' ',last_name), -- 对first_name和last_name做拼接 gender=@C5, hire_date=@C6 , modify_date=now(), -- 生成当前时间数据 delete_flag=if(hire_date<'1988-01-01','Y','N'); -- 对需要生成的值基于某一列做条件运算
场景 4. LOAD 定长数据
参考文档:
https://mp.weixin.qq.com/s/WNXRshkvC3bFcc5NDaWlrw
五、LOAD 总结
- 默认情况下导入的顺序以文本文件 列-从左到右,行-从上到下 的顺序导入
- 如果表结构和文本数据不一致,建议将文本文件中的各列依次顺序编号并与表中字段建立 mapping 关系,以防数据导入到错误的字段
- 对于待导入的文本文件较大的场景,建议将文件 按行拆分 为多个小文件,如用 split 拆分
- 对文件导入后建议执行以下语句验证导入的数据是否有 Warning,ERROR 以及导入的数据量
GET DIAGNOSTICS @p1=NUMBER,@p2=ROW_COUNT;
select @p1 AS ERROR_COUNT,@p2 as ROW_COUNT; - 文本文件数据与表结构存在过大的差异或数据需要做清洗转换,建议还是用专业的 ETL 工具或先粗略导入 MySQL 中再进行加工转换处理。
如果文章对你有帮助,请赞赏
赞赏
0人进行了赞赏支持
更多相关文章
- 携程数据库高可用架构实践
- 带噪学习研究及其在内容审核业务下的工业级应用
- 知识蒸馏:让LSTM重返巅峰!
- ML笔记 | 零基础学懂机器学习(五)
- 「最有用」的特殊大数据:一文看懂文本信息系统的概念框架及功能
- 一文看懂数据清洗:缺失值、异常值和重复值的处理
- 41款实用工具,数据获取、清洗、建模、可视化都有了
- 小鼠肿瘤生物学数据库
- 癌症蛋白质分析利器:癌症蛋白质组图谱(TCPA)