高级MySQL:查找轮询响应之间的相关性
I've got four MySQL tables:
我有四个MySQL表:
users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)
users(id,name)polls(id,text)选项(id,poll_id,text)响应(id,poll_id,option_id,user_id)
Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.
鉴于特定的民意调查和特定选项,我想生成一个表格,显示其他民意调查中哪些选项最相关。
Suppose this is our data set:
假设这是我们的数据集:
TABLE users:
+------+-------+
| id | name |
+------+-------+
| 1 | Abe |
| 2 | Bob |
| 3 | Che |
| 4 | Den |
+------+-------+
TABLE polls:
+------+-----------------------+
| id | text |
+------+-----------------------+
| 1 | Do you like apples? |
| 2 | What is your gender? |
| 3 | What is your height? |
| 4 | Do you like polls? |
+------+-----------------------+
TABLE options:
+------+----------+---------+
| id | poll_id | text |
+------+----------+---------+
| 1 | 1 | Yes |
| 2 | 1 | No |
| 3 | 2 | Male |
| 4 | 2 | Female |
| 5 | 3 | Short |
| 6 | 3 | Tall |
| 7 | 4 | Yes |
| 8 | 4 | No |
+------+----------+---------+
TABLE responses:
+------+----------+------------+----------+
| id | poll_id | option_id | user_id |
+------+----------+------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
| 10 | 3 | 6 | 2 |
| 10 | 3 | 5 | 3 |
| 10 | 3 | 6 | 4 |
| 10 | 4 | 7 | 1 |
| 10 | 4 | 7 | 2 |
| 10 | 4 | 7 | 3 |
| 10 | 4 | 7 | 4 |
+------+----------+------------+----------+
Given the poll ID 1 and the option ID 2, the generated table should be something like this:
给定轮询ID 1和选项ID 2,生成的表应该是这样的:
+----------+------------+-----------------------+
| poll_id | option_id | percent_correlated |
+----------+------------+-----------------------+
| 4 | 7 | 100 |
| 2 | 3 | 66.66 |
| 3 | 6 | 66.66 |
| 2 | 4 | 33.33 |
| 3 | 5 | 33.33 |
| 4 | 8 | 0 |
+----------+------------+-----------------------+
So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.
所以基本上,我们确定了所有响应投票ID 1和选择选项ID 2的用户,我们正在查看所有其他民意调查,看看他们中有多少百分比也选择了其他选项。
3 个解决方案
#1
1
Don't have an instance handy to test, can you see if this gets proper results:
没有方便测试的实例,你能看到这是否得到了正确的结果:
select
poll_id,
option_id,
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
n
from
(
select
poll_id,
option_id,
SUM(score) AS sum1,
SUM(score_rev) AS sum2,
SUM(score * score) AS sum1sq,
SUM(score_rev * score_rev) AS sum2sq,
SUM(score * score_rev) AS psum,
COUNT(*) AS n
from
(
select
responses.poll_id,
responses.option_id,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 0
ELSE SELECT 1
END CASE as score,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 1
ELSE SELECT 0
END CASE as score_rev,
from responses left outer join
(
select
user_id
from
responses
where
poll_id = 1 and
option_id = 2
)user_resp
ON (user_resp.user_id = responses.user_id)
) temp1
group by
poll_id,
option_id
)components
更多相关文章
- 如何在HTML选择选项列表中保留空间hi
- 如何让react-native Picker保持新选择的选项?
- 如何为select中的选项创建?
- AngularJS(1.5.8) - 如何直接从获取json对象的控制器中填充选择选
- FieldErro:无法将关键字'date_added'解析为字段。选项包括:data_ad
- ubuntu 16.04 设置选项里面找不到《打印机》和《软件和更新》两
- Linux 3.4.39内核编译配置选项介绍
- ubi文件系统制作,还是"-c"选项的问题
- mysqldump的几个主要选项探究