I've got four MySQL tables:

我有四个MySQL表:

users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)

users(id,name)polls(id,text)选项(id,poll_id,text)响应(id,poll_id,option_id,user_id)

Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.

鉴于特定的民意调查和特定选项,我想生成一个表格,显示其他民意调查中哪些选项最相关。

Suppose this is our data set:

假设这是我们的数据集:

TABLE users:
+------+-------+
| id   | name  |
+------+-------+
|    1 | Abe   |
|    2 | Bob   |
|    3 | Che   |
|    4 | Den   |
+------+-------+

TABLE polls:
+------+-----------------------+
| id   | text                  |
+------+-----------------------+
|    1 | Do you like apples?   |
|    2 | What is your gender?  |
|    3 | What is your height?  |
|    4 | Do you like polls?    |
+------+-----------------------+

TABLE options:

+------+----------+---------+
| id   | poll_id  | text    |
+------+----------+---------+
|    1 | 1        | Yes     |
|    2 | 1        | No      |
|    3 | 2        | Male    |
|    4 | 2        | Female  |
|    5 | 3        | Short   |
|    6 | 3        | Tall    |
|    7 | 4        | Yes     |
|    8 | 4        | No      |
+------+----------+---------+

TABLE responses:

+------+----------+------------+----------+
| id   | poll_id  | option_id  | user_id  |
+------+----------+------------+----------+
|    1 | 1        | 1          | 1        |
|    2 | 1        | 2          | 2        |
|    3 | 1        | 2          | 3        |
|    4 | 1        | 2          | 4        |
|    5 | 2        | 3          | 1        |
|    6 | 2        | 3          | 2        |
|    7 | 2        | 3          | 3        |
|    8 | 2        | 4          | 4        |
|    9 | 3        | 5          | 1        |
|   10 | 3        | 6          | 2        |
|   10 | 3        | 5          | 3        |
|   10 | 3        | 6          | 4        |
|   10 | 4        | 7          | 1        |
|   10 | 4        | 7          | 2        |
|   10 | 4        | 7          | 3        |
|   10 | 4        | 7          | 4        |
+------+----------+------------+----------+

Given the poll ID 1 and the option ID 2, the generated table should be something like this:

给定轮询ID 1和选项ID 2,生成的表应该是这样的:

+----------+------------+-----------------------+
| poll_id  | option_id  | percent_correlated    |
+----------+------------+-----------------------+
| 4        | 7          | 100                   |
| 2        | 3          | 66.66                 |
| 3        | 6          | 66.66                 |
| 2        | 4          | 33.33                 |
| 3        | 5          | 33.33                 |
| 4        | 8          | 0                     |
+----------+------------+-----------------------+

So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.

所以基本上,我们确定了所有响应投票ID 1和选择选项ID 2的用户,我们正在查看所有其他民意调查,看看他们中有多少百分比也选择了其他选项。

3 个解决方案

#1


1

Don't have an instance handy to test, can you see if this gets proper results:

没有方便测试的实例,你能看到这是否得到了正确的结果:

select
        poll_id,
        option_id,
        ((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
        n
from
(
    select 
        poll_id,
        option_id,
        SUM(score) AS sum1,
        SUM(score_rev) AS sum2,
        SUM(score * score) AS sum1sq,
        SUM(score_rev * score_rev) AS sum2sq,
        SUM(score * score_rev) AS psum,
        COUNT(*) AS n
    from
    (
            select 
                responses.poll_id, 
                responses.option_id,
                CASE 
                    WHEN user_resp.user_id IS NULL THEN SELECT 0
                    ELSE SELECT 1
                END CASE as score,
                CASE 
                    WHEN user_resp.user_id IS NULL THEN SELECT 1
                    ELSE SELECT 0
                END CASE as score_rev,
            from responses left outer join 
                    (
                        select 
                            user_id
                        from 
                            responses 
                        where
                            poll_id = 1 and 
                            option_id = 2
                    )user_resp  
                        ON (user_resp.user_id = responses.user_id)
    ) temp1 
    group by
        poll_id,
        option_id
)components 

更多相关文章

  1. 如何在HTML选择选项列表中保留空间hi
  2. 如何让react-native Picker保持新选择的选项?
  3. 如何为select中的选项创建?
  4. AngularJS(1.5.8) - 如何直接从获取json对象的控制器中填充选择选
  5. FieldErro:无法将关键字'date_added'解析为字段。选项包括:data_ad
  6. ubuntu 16.04 设置选项里面找不到《打印机》和《软件和更新》两
  7. Linux 3.4.39内核编译配置选项介绍
  8. ubi文件系统制作,还是"-c"选项的问题
  9. mysqldump的几个主要选项探究

随机推荐

  1. C语言中的三目运算符是什么
  2. c语言是面向什么的语言
  3. C语言中字符串连接函数是什么
  4. C语言中二叉树中序遍历怎么执行?
  5. 一个c语言程序总是从什么开始执行
  6. c++中static关键字的作用是什么?
  7. c语言真假是1和0吗?
  8. 学习asp.net core集成MongoDB的完整步骤
  9. c语言三种基本程序结构是什么?
  10. printf("\n")是什么意思?