这些常被忽视的SQL错误用法,你知道吗( 三 )

关于 MySQL 外部条件不能下推的详细解释说明请参考以前文章:MySQL · 性能优化 · 条件下推到物化表 http://mysql.taobao.org/monthly/2016/07/08
7、提前缩小范围**
先上初始 SQL 语句:
SELECT * FROM my_order oLEFT JOIN my_userinfo uON o.uid = u.uid LEFT JOIN my_productinfo pON o.pid = p.pid WHERE ( o.display = 0 )AND ( o.ostaus = 1 ) ORDER BY o.selltime DESC LIMIT 0, 15 该SQL语句原意是:先做一系列的左连接,然后排序取前15条记录 。从执行计划也可以看出,最后一步估算排序记录数为90万,时间消耗为12秒 。
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort || 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL || 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+由于最后 WHERE 条件以及排序均针对最左主表,因此可以先对 my_order 排序提前缩小数据量再做左连接 。SQL 重写后如下,执行时间缩小为1毫秒左右 。
SELECT * FROM (SELECT * FROM my_order o WHERE ( o.display = 0 )AND ( o.ostaus = 1 ) ORDER BY o.selltime DESC LIMIT 0, 15) oLEFT JOIN my_userinfo uON o.uid = u.uidLEFT JOIN my_productinfo pON o.pid = p.pid ORDER BY o.selltime DESClimit 0, 15再检查执行计划:子查询物化后(select_type=DERIVED)参与 JOIN 。虽然估算行扫描仍然为90万,但是利用了索引以及 LIMIT 子句后,实际执行时间变得很小 。
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort || 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL || 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) || 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where |+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+8、中间结果集下推
再来看下面这个已经初步优化过的例子(左连接中的主表优先作用查询条件):
SELECT a.*,c.allocated FROM (SELECT resourceidFROM my_distribute dWHERE isdelete = 0AND cusmanagercode = '1234567'ORDER BY salecode limit 20) a LEFT JOIN(SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocatedFROM my_resourcesGROUP BY resourcesid) c ON a.resourceid = c.resourcesid那么该语句还存在其它问题吗?不难看出子查询 c 是全表聚合查询,在表数量特别大的情况下会导致整个语句的性能下降 。
其实对于子查询 c,左连接最后结果集只关心能和主表 resourceid 能匹配的数据 。因此我们可以重写语句如下,执行时间从原来的2秒下降到2毫秒 。
SELECT a.*,c.allocated FROM (SELECT resourceidFROM my_distribute dWHERE isdelete = 0AND cusmanagercode = '1234567'ORDER BY salecode limit 20) a LEFT JOIN(SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocatedFROM my_resources r,(SELECT resourceidFROM my_distribute dWHERE isdelete = 0AND cusmanagercode = '1234567'ORDER BY salecode limit 20) aWHERE r.resourcesid = a.resourcesidGROUP BY resourcesid) c ON a.resourceid = c.resourcesid但是子查询 a 在我们的SQL语句中出现了多次 。这种写法不仅存在额外的开销,还使得整个语句显的繁杂 。使用 WITH 语句再次重写:
WITH a AS (SELECT resourceidFROM my_distribute dWHERE isdelete = 0AND cusmanagercode = '1234567'ORDER BY salecode limit 20)SELECT a.*,c.allocated FROM a LEFT JOIN(SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocatedFROM my_resources r,aWHERE r.resourcesid = a.resourcesidGROUP BY resourcesid) c ON a.resourceid = c.resourcesid总结
数据库编译器产生执行计划,决定着SQL的实际执行方式 。但是编译器只是尽力服务,所有数据库的编译器都不是尽善尽美的 。
上述提到的多数场景,在其它数据库中也存在性能问题 。了解数据库编译器的特性,才能避规其短处,写出高性能的SQL语句 。


推荐阅读