快速使用数组的最近元素来确定新元素是否唯一

In PHP, I'm grabbing lists of timestamps from multile SQL tables and creating an array that lists the unique timestamps. The timestamps aren't identical across tables, however, and can vary by as much as a second for the same event. So, for example, I would want 1374531523.343 and 1374531524.012 to be considered the same event, but not 1374531520.342.

在PHP中,我从多个SQL表中获取时间戳列表,并创建一个列出唯一时间戳的数组。但是,不同表格之间的时间戳不相同,并且对于同一事件,时间戳可能会相差一秒。因此,例如,我希望1374531523.343和1374531524.012被视为同一事件,但不是1374531520.342。

I've been using this function to grab the time separation to the nearest event:

我一直在使用这个函数抓住时间分离到最近的事件:

function findNearest($number, $array, $index = false) {
    $min = abs($number - $array[0]);
    $min_i = 0;
    foreach ($array as $ind => $value) {    
        $mint = abs($number - $value);
        if ($mint < $min) {
            $min = $mint;
            $min_i = $ind;  
        }
    }   
    return ($index ? $min_i : $min);
}

(The index part is added because sometimes I need the index of the nearest time, but this could be moved to a separate function)

(添加索引部分是因为有时我需要最近时间的索引,但是这可以移动到单独的函数中)

So basically I run a simple SELECT ... query for each table and check each timestamp:

所以基本上我为每个表运行一个简单的SELECT ...查询并检查每个时间戳:

while ($g = $q->fetch_object())
    if (findNearest($g->timestamp, $timestamps) > 1) $timestamps[] = $g->timestamp;
}

This works like I want it to, but I'm looking at around 100,000 different timestamps, and will be looking at as many as 500,000 in the future. As $timestamps gets larger and larger, this gets slower and slower. I know that's unavoidable, but perhaps there's a better approach which can cut way down on the time, either via MySQL or PHP?

这就像我想要的那样,但我正在看大约100,000个不同的时间戳,并且将来会看到多达500,000个。随着$ timestamps变得越来越大,这变得越来越慢。我知道这是不可避免的,但也许有更好的方法可以通过MySQL或PHP缩短时间?

3 个解决方案

#1

These steps could cut your processing times much faster, but maybe will cost you a litle extra effort than you expect.

这些步骤可以更快地缩短您的处理时间,但可能会花费您比您预期更多的额外工作。

Assuming that :

假如说 :

variable $qry contains your query which store the result to variable $q

变量$ qry包含将查询结果存储到变量$ q的查询

you have three tables as below

你有三张桌子如下

you do query, which used by $q in your example, sequentially start from table_1, table_2 and lastly table_3

你执行查询,在你的例子中由$ q使用,从table_1,table_2和最后table_3开始

Table table_1 :

表table_1:

id | col_timestamp | parent_table | parent_id
---------------------------------------------
 1 | 1374531523.343|         NULL |      NULL

Table table_2 :

表table_2:

id | col_timestamp | parent_table | parent_id
---------------------------------------------
 1 | 1374531520.444|         NULL |      NULL
 2 | 1374531524.012|      table_1 |         1
 3 | 1374531556.012|         NULL |         1
 4 | 1374531556.512|      table_2 |         3

Table table_3 :

表table_3:

id | col_timestamp | parent_table | parent_id
---------------------------------------------
 1 | 1374531521.111|      table_2 |         1
 2 | 1374531523.111|      table_1 |         1

Explanation : field parent_table is a varchar to indicate current row is refer to table_1, table_2 or table_3. Field parent_id is refering to field id in table that pointed by field parent_table.

说明:字段parent_table是一个varchar,表示当前行引用table_1,table_2或table_3。字段parent_id引用字段parent_table指向的表中的字段id。

Now, everytime user insert into all tables, we need to check whether similar event already exist in database. We could do this by using trigger. This trigger is fired everytime a row is inserted to table_2 :

现在,每次用户插入所有表时,我们都需要检查数据库中是否已存在类似事件。我们可以通过使用触发器来实现。每次向table_2插入一行时都会触发此触发器:

DELIMITER $$

USE `your_database`$$

DROP TRIGGER /*!50032 IF EXISTS */ `before_insert_table_2`$$

CREATE TRIGGER `before_insert_table_2` BEFORE INSERT 
ON `table_2` FOR EACH ROW 
BEGIN
  DECLARE var_id INTEGER ;
  DECLARE var_table VARCHAR (10) ;
  SELECT id, parent_table INTO var_id, var_table 
  FROM
    ( SELECT id, 'table_1' AS parent_table, col_timestamp 
      FROM table_1 
      WHERE parent_id IS NULL 
      AND col_timestamp BETWEEN NEW.col_timestamp - 1 AND NEW.col_timestamp + 1 
      UNION
      SELECT id, 'table_2' AS parent_table, col_timestamp 
      FROM table_2 
      WHERE parent_id IS NULL 
      AND col_timestamp BETWEEN NEW.col_timestamp - 1 AND NEW.col_timestamp + 1
    ) 
  ORDER BY ABS(col_timestamp - NEW.col_timestamp), parent_table 
  LIMIT 1 ;
  SET NEW.parent_id = var_id ;
  SET NEW.parent_table = var_table ;
END ;
$$

DELIMITER ;

Do similar step to table_1 and table_3.

执行与table_1和table_3类似的步骤。

Next step is to set parent_table and parent_id into already existing data. You could modify your $qry to get table name and its id and update into associated row. This step only need to be run once.

下一步是将parent_table和parent_id设置为已存在的数据。您可以修改$ qry以获取表名及其ID并更新到关联的行。此步骤只需运行一次。

Next step is to modify your query to get events. This is an example :

下一步是修改您的查询以获取事件。这是一个例子:

SELECT 'table_1' original_table, id
FROM table_1 
WHERE parent_id IS NULL
UNION
SELECT 'table_2' original_table, id
FROM table_2
WHERE parent_id IS NULL
UNION
SELECT 'table_3' original_table, id
FROM table_3
WHERE parent_id IS NULL

Last step is to modify your program to implement the changes in database.

最后一步是修改程序以实现数据库中的更改。

Hopefully this help.

希望这有帮助。

3 个解决方案

#1

更多相关文章

随机推荐