MySQL 开发组于 2019 年 10 月 14 日 正式发布了 MySQL 8.0.18 GA 版本,带来了一些新特性和增强功能。其中最引人注目的莫过于多表连接查询支持 hash join 方式了。我们先来看看官方的描述:

MySQL 实现了用于内连接查询的 hash join 方式。例如,从 MySQL 8.0.18 开始以下查询可以使用 hash join 进行连接查询:

SELECT *   FROM t1   JOIN t2     ON t1.c1=t2.c1;
CREATE TABLE t1 (c1 INT, c2 INT);CREATE TABLE t2 (c1 INT, c2 INT);CREATE TABLE t3 (c1 INT, c2 INT);
mysql> EXPLAIN FORMAT=TREE  -> SELECT *   ->   FROM t1   ->   JOIN t2   ->     ON t1.c1=t2.c1\G*************************** 1. row ***************************EXPLAIN: -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)  -> Table scan on t2 (cost=0.35 rows=1)  -> Hash    -> Table scan on t1 (cost=0.35 rows=1)

多个表之间使用等值连接的的查询也会进行这种优化。例如以下查询:

SELECT *   FROM t1  JOIN t2     ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)  JOIN t3     ON (t2.c1 = t3.c1);
mysql> EXPLAIN FORMAT=TREE  -> SELECT *   ->   FROM t1  ->   JOIN t2   ->     ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)  ->   JOIN t3   ->     ON (t2.c1 = t3.c1)\G*************************** 1. row ***************************EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1)  -> Table scan on t3 (cost=0.35 rows=1)  -> Hash    -> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1)      -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)        -> Table scan on t2 (cost=0.35 rows=1)        -> Hash          -> Table scan on t1 (cost=0.35 rows=1)

但是,如果任何连接语句(ON)中没有使用等值连接条件,将不会采用 hash join 连接方式。例如:

mysql> EXPLAIN FORMAT=TREE  ->   SELECT *   ->     FROM t1  ->     JOIN t2   ->       ON (t1.c1 = t2.c1)  ->     JOIN t3   ->       ON (t2.c1 < t3.c1)\G*************************** 1. row ***************************EXPLAIN: <not executable by iterator executor>
mysql> EXPLAIN  ->   SELECT *   ->     FROM t1  ->     JOIN t2   ->       ON (t1.c1 = t2.c1)  ->     JOIN t3   ->       ON (t2.c1 < t3.c1)\G       *************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: t1  partitions: NULL     type: ALLpossible_keys: NULL     key: NULL   key_len: NULL     ref: NULL     rows: 1   filtered: 100.00    Extra: NULL*************************** 2. row ***************************      id: 1 select_type: SIMPLE    table: t2  partitions: NULL     type: ALLpossible_keys: NULL     key: NULL   key_len: NULL     ref: NULL     rows: 1   filtered: 100.00    Extra: Using where; Using join buffer (Block Nested Loop)*************************** 3. row ***************************      id: 1 select_type: SIMPLE    table: t3  partitions: NULL     type: ALLpossible_keys: NULL     key: NULL   key_len: NULL     ref: NULL     rows: 1   filtered: 100.00    Extra: Using where; Using join buffer (Block Nested Loop)
mysql> EXPLAIN FORMAT=TREE  -> SELECT *  ->   FROM t1  ->   JOIN t2  ->   WHERE t1.c2 > 50\G*************************** 1. row ***************************EXPLAIN: -> Inner hash join (cost=0.70 rows=1)  -> Table scan on t2 (cost=0.35 rows=1)  -> Hash    -> Filter: (t1.c2 > 50) (cost=0.35 rows=1)      -> Table scan on t1 (cost=0.35 rows=1)

在全局或者会话级别设置服务器系统变量 optimizer_switch 中的 hash_join=on 或者 hash_join=off 选项。默认为 hash_join=on

在语句级别为特定的连接指定优化器提示 HASH_JOIN 或者 NO_HASH_JOIN。

可以通过系统变量 join_buffer_size 控制 hash join 允许使用的内存数量;hash join 不会使用超过该变量设置的内存数量。如果 hash join 所需的内存超过该阈值,MySQL 将会在磁盘中执行操作。需要注意的是,如果 hash join 无法在内存中完成,并且打开的文件数量超过系统变量 open_files_limit 的值,连接操作可能会失败。为了解决这个问题,可以使用以下方法之一:

增加 join_buffer_size 的值,确保 hash join 可以在内存中完成。

增加 open_files_limit 的值。

接下来他们比较一下 hash join block nested loop 的性能,首先分别为 t1、t2 和 t3 生成 1000000 条记录:

set join_buffer_size=2097152000;SET @@cte_max_recursion_depth = 99999999;INSERT INTO t1-- INSERT INTO t2-- INSERT INTO t3WITH RECURSIVE t AS ( SELECT 1 AS c1, 1 AS c2 UNION ALL SELECT t.c1 + 1, t.c1 * 2  FROM t  WHERE t.c1 < 1000000)SELECT * FROM t;
mysql> EXPLAIN ANALYZE  -> SELECT COUNT(*)  ->  FROM t1  ->  JOIN t2   ->   ON (t1.c1 = t2.c1)  ->  JOIN t3   ->   ON (t2.c1 = t3.c1)\G*************************** 1. row ***************************EXPLAIN: -> Aggregate: count(0) (actual time=22993.098..22993.099 rows=1 loops=1)  -> Inner hash join (t3.c1 = t1.c1) (cost=9952535443663536.00 rows=9952435908880402) (actual time=14489.176..21737.032 rows=1000000 loops=1)    -> Table scan on t3 (cost=0.00 rows=998412) (actual time=0.103..3973.892 rows=1000000 loops=1)    -> Hash      -> Inner hash join (t2.c1 = t1.c1) (cost=99682753413.67 rows=99682653660) (actual time=5663.592..12236.984 rows=1000000 loops=1)        -> Table scan on t2 (cost=0.01 rows=998412) (actual time=0.067..3364.105 rows=1000000 loops=1)        -> Hash          -> Table scan on t1 (cost=100539.40 rows=998412) (actual time=0.133..3395.799 rows=1000000 loops=1)1 row in set (23.22 sec)mysql> SELECT COUNT(*)  ->  FROM t1  ->  JOIN t2   ->   ON (t1.c1 = t2.c1)  ->  JOIN t3   ->   ON (t2.c1 = t3.c1);+----------+| COUNT(*) |+----------+| 1000000 |+----------+1 row in set (12.98 sec)
mysql> EXPLAIN FORMAT=TREE  -> SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)  ->  FROM t1  ->  JOIN t2   ->   ON (t1.c1 = t2.c1)  ->  JOIN t3   ->   ON (t2.c1 = t3.c1)\G*************************** 1. row ***************************EXPLAIN: <not executable by iterator executor>1 row in set (0.00 sec)SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*) FROM t1 JOIN t2   ON (t1.c1 = t2.c1) JOIN t3   ON (t2.c1 = t3.c1);

再看有索引时的 block nested loop 方法,增加索引:

mysql> CREATE index idx1 ON t1(c1);Query OK, 0 rows affected (7.39 sec)Records: 0 Duplicates: 0 Warnings: 0mysql> CREATE index idx2 ON t2(c1);Query OK, 0 rows affected (6.77 sec)Records: 0 Duplicates: 0 Warnings: 0mysql> CREATE index idx3 ON t3(c1);Query OK, 0 rows affected (7.23 sec)Records: 0 Duplicates: 0 Warnings: 0
mysql> EXPLAIN ANALYZE  -> SELECT COUNT(*)  ->  FROM t1  ->  JOIN t2   ->   ON (t1.c1 = t2.c1)  ->  JOIN t3   ->   ON (t2.c1 = t3.c1)\G*************************** 1. row ***************************EXPLAIN: -> Aggregate: count(0) (actual time=47684.034..47684.035 rows=1 loops=1)  -> Nested loop inner join (cost=2295573.22 rows=998412) (actual time=0.116..46363.599 rows=1000000 loops=1)    -> Nested loop inner join (cost=1198056.31 rows=998412) (actual time=0.087..25788.696 rows=1000000 loops=1)      -> Filter: (t1.c1 is not null) (cost=100539.40 rows=998412) (actual time=0.050..5557.847 rows=1000000 loops=1)        -> Index scan on t1 using idx1 (cost=100539.40 rows=998412) (actual time=0.043..3253.769 rows=1000000 loops=1)      -> Index lookup on t2 using idx2 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)    -> Index lookup on t3 using idx3 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)1 row in set (47.68 sec)mysql> SELECT COUNT(*)  ->  FROM t1  ->  JOIN t2   ->   ON (t1.c1 = t2.c1)  ->  JOIN t3   ->   ON (t2.c1 = t3.c1);+----------+| COUNT(*) |+----------+| 1000000 |+----------+1 row in set (19.56 sec)

Hash Join(无索引) Block Nested Loop(无索引) Block Nested Loop(有索引)
12.98 s 未返回 19.56 s

再增加一个 Oracle 12c 中无索引时 hash join 结果:1.282 s。

再增加一个 PostgreSQL 11.5 中无索引时 hash join 结果:6.234 s。

再增加一个 SQL 2017 中无索引时 hash join 结果:5.207 s。

总结

更多相关文章

  1. MySQL系列多表连接查询92及99语法示例详解教程
  2. Linux下MYSQL 5.7 找回root密码的问题(亲测可用)
  3. MySQL 什么时候使用INNER JOIN 或 LEFT JOIN
  4. android从服务器下载文件(php+apache+win7+MySql)
  5. 【有图】android通过jdbc连接mysql(附文件)
  6. Android(安卓)Studio3.0 新特性 ~ New Features in Android(安卓)
  7. Android(安卓)4.1的新特性介绍
  8. Android(安卓)7.0新特性
  9. Android(安卓)Studio 2.0正式版 新特性

随机推荐

  1. 35、键盘布局的tableLayout备份
  2. Android(安卓)Gallery子元素无法横向填满
  3. Android植物大战僵尸小游戏
  4. android返回HOME界面
  5. android textview部分字体变颜色
  6. android自带图片资源
  7. Android 珍藏(三)
  8. Android系统工具之Monkey自动化测试
  9. Android菜单实例
  10. [Android]Common Sreen Size of Android