I'm trying to optimize the following query:

我正在尝试优化以下查询:

            SELECT name  
            FROM  tbl 
            WHERE user_id
                IN (".$user_ids.") 
            GROUP BY name ORDER BY SUM(counter) DESC LIMIT 10

Tbl info: name is VARCHAR, counter and user_id are INTs. user_id, name is unique.

Tbl信息:名字是VARCHAR, counter和user_id是INTs。user_id,名字是独一无二的。

I've tried adding IDX(user_id, counter, name) but in EXPLAIN I'm still seeing Using where; Using index; Using temporary; Using filesort so I guess I'm doing something wrong.

我试过添加IDX(user_id, counter, name),但在解释中,我仍然可以看到使用的位置;使用索引;使用临时的;使用文件,我想我做错了。

What is the proper index for such a query?

这样的查询的正确索引是什么?

2 个解决方案

#1


1

The correct index is IDX(user_id, name, counter), but the query needs additional computations after the data is taken from the index. If the amount of different names is about 10 there is hardly anything you can do (the most of the time is taken by the sum operation), but if there are many different names, you can reduce sorting by using some empirical knowledge about the SUM(counter) threshold:

正确的索引是IDX(user_id、name、counter),但是在从索引中获取数据之后,查询需要额外的计算。如果不同的名字的数量大约是10,那么你几乎没有什么可以做的(大部分时间都是由sum操作完成的),但是如果有很多不同的名称,你可以通过使用一些关于sum (counter)阈值的经验知识来减少排序:

SELECT name  
FROM  tbl 
WHERE user_id IN (".$user_ids.") 
GROUP BY name
HAVING SUM(counter) > 1000 -- adjust the threshold 
ORDER BY SUM(counter) DESC LIMIT 10

UPD1. Hm, if you say that you've tried the IDX(user_id, name, counter) index and the performance is the same, I actually can not see the reason why it is slow, unless you pass several hundred user ids (in which case the time is spent for the query parsing and not for the execution).

UPD1。嗯,如果你说你已经试过IDX(user_id、名称、计数器)指数和性能是一样的,其实我看不出它的原因是缓慢的,除非你通过几百个用户id(在这种情况下,查询的时间解析和不执行)。

UPD2. MySQL IN operator does some additional magic:

UPD2。运算符中的MySQL有一些额外的魔力:

Returns 1 if expr is equal to any of the values in the IN list, else returns 0. If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search.

如果expr等于in列表中的任意值,则返回0。如果所有的值都是常量,则根据expr的类型进行评估并排序。然后使用二分查找来搜索条目。

That means if you pass INT values into the operator IN (1,2,3), they are sorted as INTS, if you serialize integers that are stored as strings IN ('1', '11', '111', '12') they are sorted in lexicographical order. The rationale of the sorting is to eliminate random index reads, which is significant when you pass a lot of values into the operator.

这意味着,如果将INT值传递给操作符(1、2、3),它们就会被排序为INT类型,如果将存储为字符串的整数序列化为('1'、'11'、'111'、'12'),它们按字典顺序排序。排序的基本原理是消除随机索引读取,当你将大量的值传递给运算符时,这是很重要的。

更多相关文章

  1. Mysql order by语句未使用索引的思考
  2. Alibaba Java开发手册索引规约学习笔记

随机推荐

  1. Linux常用命令汇总-速查
  2. Linux 3.4.39内核编译配置选项介绍
  3. linux shell脚本编程笔记(四): 获取字符串长
  4. linux apache安装https证书
  5. 有什么办法可以在Windows上不用使用太繁
  6. 线程同步-生产者消费者问题
  7. linux下touch命令也可以一次创建多个文件
  8. linux如何处理多连接请求?
  9. Linux下安装配置MongoDB数据库图解
  10. Linux驱动学习1.hello world;