【分享】MySQL分组排序(执行效率对比)

在一个表中取出根据某个字段排序并根据另一个字段分组的若干条数据,你会怎么写SQL?

如有一张记录用户登录的表(用户每登录一次都会在表中记录),要查询最后3个登录的用户及时间。

假设表结构如下:

mysql> desc table_test;
+----------+------------------+------+-----+---------+----------------+
| Field	| Type			 | Null | Key | Default | Extra		  
+----------+------------------+------+-----+---------+----------------+
| Id	   | int(10) unsigned | NO   | PRI | NULL	| auto_increment 
| UserId   | int(10) unsigned | NO   |	 | NULL
| UserName | varchar(255)	 | NO   |	 | NULL
| Time	 | datetime		 | NO   |	 | NULL
+----------+------------------+------+-----+---------+----------------+
4 rows in set

数据如下:

mysql> select *
	-> from `table_test`;
+----+--------+----------+---------------------+
| Id | UserId | UserName | Time
+----+--------+----------+---------------------+
|  1 |   1223 | test1	| 2011-12-05 21:46:32 
|  2 |   1224 | test2	| 2011-12-13 21:46:56 
|  3 |   1225 | test3	| 2011-12-14 21:47:24 
|  4 |   1223 | test1	| 2011-12-16 21:47:37 
|  5 |   1223 | test1	| 2011-12-16 21:47:50 
|  6 |   1224 | test2	| 2011-12-17 21:48:02 
|  7 |   1225 | test3	| 2011-12-19 21:48:16 
|  8 |   1223 | test1	| 2011-12-21 21:50:58 
|  9 |   1226 | test4	| 2011-12-23 21:51:27 
| 10 |   1223 | test1	| 2011-12-24 21:52:05 
+----+--------+----------+---------------------+
10 rows in set
很容易写出这样的SQL:

mysql> select `UserId`, `UserName`, `Time`
	-> from `table_test`
	-> group by `UserId`
	-> order by `Time` desc
	-> limit 3;
+--------+----------+---------------------+
| UserId | UserName | Time
+--------+----------+---------------------+
|   1226 | test4	| 2011-12-23 21:51:27 
|   1225 | test3	| 2011-12-14 21:47:24 
|   1224 | test2	| 2011-12-13 21:46:56 
+--------+----------+---------------------+
3 rows in set
很明显,这不是最近登录的3个用户,这样写是不对的。

分析一下,这是因为MySQL先执行了group by,按`UserId`进行了分组,分组的过程中按照默认排序(这里是主键升序)每个用户取出一条记录,相当于执行:

mysql> select * from `table_test` group by `UserId`;
+----+--------+----------+---------------------+
| Id | UserId | UserName | Time
+----+--------+----------+---------------------+
|  1 |   1223 | test1	| 2011-12-05 21:46:32 
|  2 |   1224 | test2	| 2011-12-13 21:46:56 
|  3 |   1225 | test3	| 2011-12-14 21:47:24 
|  9 |   1226 | test4	| 2011-12-23 21:51:27 
+----+--------+----------+---------------------+
4 rows in set
然后再对取出来的数据用`Time`字段排序,最后取出前3条,相当于:

mysql> select `UserId`, `UserName`, `Time` from(select * from `table_test` group by `UserId`) t order by `Time` desc limit 3;
+--------+----------+---------------------+
| UserId | UserName | Time
+--------+----------+---------------------+
|   1226 | test4	| 2011-12-23 21:51:27 
|   1225 | test3	| 2011-12-14 21:47:24 
|   1224 | test2	| 2011-12-13 21:46:56 
+--------+----------+---------------------+
3 rows in set

正确地写法这里介绍两种第一种采用子查询:

mysql> select `UserId`, `UserName`, `Time`
	-> from (
	-> 	select `UserId`, `UserName`, `Time`
	->  from `table_test`
	->  order by `Time` desc
	-> ) t
	-> group by t.`UserId`
	-> order by t.`Time` desc
	-> limit 3;
+--------+----------+---------------------+
| UserId | UserName | Time
+--------+----------+---------------------+
|   1223 | test1	| 2011-12-24 21:52:05 
|   1226 | test4	| 2011-12-23 21:51:27 
|   1225 | test3	| 2011-12-19 21:48:16 
+--------+----------+---------------------+
3 rows in set

效果达到了,性能如何呢:

mysql> explain select `UserId`, `UserName`, `Time`
	-> from (
	-> 	select `UserId`, `UserName`, `Time`
	->  from `table_test`
	->  order by `Time` desc
	-> ) t
	-> group by t.`UserId`
	-> order by t.`Time` desc
	-> limit 3;
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table	  | type | possible_keys | key  | key_len | ref  | rows | Extra						   
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY	 | <derived2> | ALL  | NULL		  | NULL | NULL	| NULL |   10 | Using temporary; Using filesort 
|  2 | DERIVED	 | table_test | ALL  | NULL		  | NULL | NULL	| NULL |   10 | Using filesort				  
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set

执行计划很糟糕:全表扫描,Using filesort,Using temporary。即使`UserId`和`Time`字段加上索引也无济于事。

再看第二种方法

mysql> select `UserId`, `UserName`, max(`Time`) as `MaxTime` from `table_test` group by `UserId` order by `MaxTime` desc limit 3;
+--------+----------+---------------------+
| UserId | UserName | MaxTime			 
+--------+----------+---------------------+
|   1223 | test1	| 2011-12-24 21:52:05 
|   1226 | test4	| 2011-12-23 21:51:27 
|   1225 | test3	| 2011-12-19 21:48:16 
+--------+----------+---------------------+
3 rows in set

执行计划如下:

mysql> explain select `UserId`, `UserName`, max(`Time`) as `MaxTime` from `table_test` group by `UserId` order by `MaxTime` desc limit 3;
+----+-------------+------------+-------+---------------+--------------+---------+------+------+---------------------------------+
| id | select_type | table	  | type  | possible_keys | key		  | key_len | ref  | rows | Extra						   
+----+-------------+------------+-------+---------------+--------------+---------+------+------+---------------------------------+
|  1 | SIMPLE	  | table_test | index | NULL		  | index_userid | 4	   | NULL |   10 | Using temporary; Using filesort 
+----+-------------+------------+-------+---------------+--------------+---------+------+------+---------------------------------+
1 row in set

type为index,`UserId`字段上的索引也发挥了作用:)

【附】MySQL分组排序


阅读本文后,您的心情是:
 
恶心
愤怒
强赞
感动
路过
无聊
雷囧
关注
知识共享许可协议
评论(0) 浏览(23277) 引用(0)
引用地址:http://blog.baiwand.com/tb.php?sc=598e83&id=91
Tags:
« 【原创】简洁的JQ幻灯片插件 【分享】 jquery之管理包装元素集合(筛选元素包装集之添加更多元素到包装集,add()方法的使用) »

Blogger

  • blogger
  • 天之骄子
  • 职位:研发工程师
    铭言:
    阳光与欢乐同在,
    与我同在
    主页:
    blog.baiwand.com

分类目录

日志归档

主题标签

数据统计

  • 日志:151篇
  • 评论:45条
  • 碎语:264条
  • 引用:0条

链接表

随机日志 »

最新日志 »

最新评论 »

标签云 »

订阅Rss
sitemap