一、问题来源

有一个朋友@水米田 问我,基于POSITION的主从。他做了如下的操作

将备份的一些binlog文件加入到了目录中
修改index文件,加入了这些binlog文件
flush binary logs
然后整个主从环境大量延迟。

二、朋友的测试

下面是另外一个朋友@徐晨亮的测试:

master上:(root:db1@xucl:10:30:22)[(none)]> show binary logs;+---------------------+-----------+| Log_name      | File_size |+---------------------+-----------+| mysql-binlog.000031 |   1019 || mysql-binlog.000032 |    424 || mysql-binlog.000033 |    244 || mysql-binlog.000034 |   2332 || mysql-binlog.000035 |   2134 || mysql-binlog.000036 |  845915 || mysql-binlog.000037 |   11735 || mysql-binlog.000038 |    284 || mysql-binlog.000039 |    284 || mysql-binlog.000040 |    284 || mysql-binlog.000041 |    284 || mysql-binlog.000042 |    234 |+---------------------+-----------+12 rows in set (0.00 sec)(root:db1@xucl:10:30:34)[(none)]> purge binary logs to 'mysql-binlog.000039';Query OK, 0 rows affected (0.00 sec)(root:db1@xucl:10:30:49)[(none)]> show binary logs;+---------------------+-----------+| Log_name      | File_size |+---------------------+-----------+| mysql-binlog.000039 |    284 || mysql-binlog.000040 |    284 || mysql-binlog.000041 |    284 || mysql-binlog.000042 |    234 |+---------------------+-----------+4 rows in set (0.00 sec)执行插入数据(root:db1@xucl:10:31:23)[xucl]> insert into t values(9,9);将备份的binlog拷贝回原先的目录并修改index文件进行注册[root@izbp12nspj47ypto9t6vyez logs]# ll总用量 884-rw-r----- 1 mysql mysql  1019 5月 20 22:03 mysql-binlog.000031-rw-r----- 1 mysql mysql  424 5月 20 22:03 mysql-binlog.000032-rw-r----- 1 mysql mysql  244 5月 20 22:03 mysql-binlog.000033-rw-r----- 1 mysql mysql  2332 5月 20 22:03 mysql-binlog.000034-rw-r----- 1 mysql mysql  2134 5月 20 22:03 mysql-binlog.000035-rw-r----- 1 mysql mysql 845915 5月 20 22:03 mysql-binlog.000036-rw-r----- 1 mysql mysql 11735 5月 20 22:05 mysql-binlog.000037-rw-r----- 1 mysql mysql  284 5月 20 22:06 mysql-binlog.000038-rw-r----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000039-rw-r----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000040-rw-r----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000041-rw-r----- 1 mysql mysql  491 5月 21 10:31 mysql-binlog.000042-rw-r----- 1 mysql mysql  204 5月 21 10:30 mysql-binlog.index主库flush binary logs(root:db1@xucl:10:32:51)[(none)]> flush binary logs;Query OK, 0 rows affected (0.01 sec)(root:db1@xucl:10:32:57)[(none)]> show binary logs;+---------------------+-----------+| Log_name      | File_size |+---------------------+-----------+| mysql-binlog.000031 |   1019 || mysql-binlog.000032 |    424 || mysql-binlog.000033 |    244 || mysql-binlog.000034 |   2332 || mysql-binlog.000035 |   2134 || mysql-binlog.000036 |  845915 || mysql-binlog.000037 |   11735 || mysql-binlog.000038 |    284 || mysql-binlog.000039 |    284 || mysql-binlog.000040 |    284 || mysql-binlog.000041 |    284 || mysql-binlog.000042 |    541 || mysql-binlog.000043 |    234 |+---------------------+-----------+13 rows in set (0.00 sec)此时,slave报错如下:(root:db1@xucl:10:31:05)[(none)]> show slave status\G*************************** 1. row ***************************        Slave_IO_State:         Master_Host: 127.0.0.1         Master_User: repl         Master_Port: 3306        Connect_Retry: 60       Master_Log_File: mysql-binlog.000035     Read_Master_Log_Pos: 194        Relay_Log_File: izbp12nspj47ypto9t6vyez-relay-bin.000011        Relay_Log_Pos: 373    Relay_Master_Log_File: mysql-binlog.000035       Slave_IO_Running: No      Slave_SQL_Running: Yes       Replicate_Do_DB:     Replicate_Ignore_DB:      Replicate_Do_Table:    Replicate_Ignore_Table:   Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:          Last_Errno: 0          Last_Error:         Skip_Counter: 0     Exec_Master_Log_Pos: 194       Relay_Log_Space: 648       Until_Condition: None        Until_Log_File:        Until_Log_Pos: 0      Master_SSL_Allowed: No      Master_SSL_CA_File:      Master_SSL_CA_Path:       Master_SSL_Cert:      Master_SSL_Cipher:        Master_SSL_Key:    Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No        Last_IO_Errno: 1236        Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF, at file /storage/single/mysql3306/logs/mysql-binlog.000035, position 194.; the first event 'mysql-binlog.000039' at 234, the last event read from '/storage/single/mysql3306/logs/mysql-binlog.000035' at 259, the last byte read from '/storage/single/mysql33'        Last_SQL_Errno: 0        Last_SQL_Error: Replicate_Ignore_Server_Ids:       Master_Server_Id: 3306         Master_UUID: e8bdf01a-c79b-11e8-82b3-00163e088352       Master_Info_File: mysql.slave_master_info          SQL_Delay: 0     SQL_Remaining_Delay: NULL   Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates      Master_Retry_Count: 86400         Master_Bind:   Last_IO_Error_Timestamp: 190521 10:32:57   Last_SQL_Error_Timestamp:        Master_SSL_Crl:      Master_SSL_Crlpath:      Retrieved_Gtid_Set:      Executed_Gtid_Set: 4c423515-6661-11e9-b767-00163e088352:1-7,e8bdf01a-c79b-11e8-82b3-00163e088352:1-57192        Auto_Position: 0     Replicate_Rewrite_DB:         Channel_Name:      Master_TLS_Version:1 row in set (0.00 sec)从slave上的报错来看,在主库flush binary logs后,从库又读取注册的binlog并且又apply了

从整个测试来看,MySQL视乎将手动注册的文件进行了传输和应用。报错是因为这个库以前是GITD_MODE=ON的,但是测试的时候已经关闭了GTID_MODE,改为了POSITION的模式,这个报错是因为DUMP线程会进行检测:

这个图来自我新写的一个系列(暂时还没发布,大概年底才能写好)。不管怎么说,DUMP线程已经在开始传输老的binlog文件了。那么原因是什么呢?下面我们将进行解释。

四、flush binary logs对binlog的操作

flush binary logs 将包含如下操作:

  • 获取新的binlog文件名字
  • 关闭旧的binlog
  • 关闭index file
  • 打开index file
  • 打开新的binlog
  • 将新的binlog加入到indexfile

具体可以参考函数MYSQL_BIN_LOG::new_file_impl。其中要说明一下获取新的binlog文件名字是通过函数find_uniq_filename实现的,其中包含如下代码:

 file_info= dir_info->dir_entry; for (i= dir_info->number_off_files ; i-- ; file_info++) {  if (strncmp(file_info->name, start, length) == 0 &&  is_number(file_info->name+length, &number,0))  {   set_if_bigger(max_found, number);  } }... *next= (need_next || max_found == 0) ? max_found + 1 : max_found;
#0 find_uniq_filename (name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", next=0x7fffec5ec678, need_next=true)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3679#1 0x000000000187d208 in generate_new_log_name (new_name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", new_ext=0x0,   log_name=0x7ffedc011520 "/mysqldata/mysql3340/log/binlog", is_binlog=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3767#2 0x0000000001884fdb in MYSQL_BIN_LOG::new_file_impl (this=0x2e83640, need_lock_log=false, extra_description_event=0x0)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7175#3 0x0000000001884cbb in MYSQL_BIN_LOG::new_file_without_locking (this=0x2e83640, extra_description_event=0x0)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7099#4 0x0000000001886b6b in MYSQL_BIN_LOG::rotate (this=0x2e83640, force_rotate=true, check_purge=0x7fffec5ecbfb)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7775#5 0x0000000001886d53 in MYSQL_BIN_LOG::rotate_and_purge (this=0x2e83640, thd=0x7ffedc000b90, force_rotate=true)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7846

五、主从问题的产生

binlog切换后,DUMP线程也需要读取下一个binlog文件。我们来看看它是怎么确认读取哪一个文件的。

在函数sender.run()中可以找到循环每个binlog文件的代码。下面一句是寻找下一个binlog文件:

int error= mysql_bin_log.find_next_log(&m_linfo, 0);
 my_b_seek(&index_file, linfo->index_file_offset);//进行偏移量偏移 linfo->index_file_start_offset= linfo->index_file_offset; length=my_b_gets(&index_file, fname, FN_REFLEN));//读取文件名字...  if(normalize_binlog_name(full_fname, fname, is_relay_log))... linfo->index_file_offset= my_b_tell(&index_file);//偏移量从新记录以备下一次使用
  1. 如果是GTID_MODE 关闭,使用POSITION那么这种情况一定报错,比如重复的行。
  2. 如果是GTID_MODE 和AUTO_POSITION=1,那么DUMP线程会进行GTID的过滤不发送,因为Event不发送所以延迟会持续一段时间为0。
  3. 如果是GTID_MODE 和AUTO_POSITION=0,那么SQL线程应用GITD_EVENT的时候会进行过滤,延迟可能出现很大的情况。

尽管GTID可能可以屏蔽这种问题,但是DUMP线程已经在考虑发送老的binlog文件了,这是不合适的。

六、purge binary logs能够维护这个偏移量

为什么purge binary logs不会出现问题呢,因为在purge binary logs的语句下,会维护这个偏移量如下:

 virtual void operator()(THD *thd) {  LOG_INFO* linfo;  mysql_mutex_lock(&thd->LOCK_thd_data);  if ((linfo= thd->current_linfo))//b binlog.cc:2829  {   /*    Index file offset can be less that purge offset only if    we just started reading the index file. In that case    we have nothing to adjust.   */   if (linfo->index_file_offset < m_purge_offset)    linfo->fatal = (linfo->index_file_offset != 0);   else    linfo->index_file_offset -= m_purge_offset;  }  mysql_mutex_unlock(&thd->LOCK_thd_data);
#0 Adjust_offset::operator() (this=0x7fffec5ec720, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2831#1 0x0000000000eef320 in Do_THD::operator() (this=0x7fffec5ec6a0, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:46#2 0x0000000000eefa0f in std::for_each<THD**, Do_THD> (__first=0x3003358, __last=0x3003368, __f=...)  at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algo.h:4200#3 0x0000000000eeefc0 in Global_THD_manager::do_for_all_thd (this=0x3003340, func=0x7fffec5ec720)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:273#4 0x000000000187ae0a in adjust_linfo_offsets (purge_offset=0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2873#5 0x0000000001883239 in MYSQL_BIN_LOG::remove_logs_from_index (this=0x2e83640, log_info=0x7fffec5ec7d0, need_update_threads=true)  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6352#6 0x0000000001883676 in MYSQL_BIN_LOG::purge_logs (this=0x2e83640, to_log=0x7fffec5eca80 "/mysqldata/mysql3340/log/binlog.000001", included=false,   need_lock_index=true, need_update_threads=true, decrease_log_space=0x0, auto_purge=false) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6469#7 0x000000000187b4b5 in purge_master_logs (thd=0x7ffee0000c00, to_log=0x7ffee0006600 "binlog.000001")  at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3127

1、 环境

mysql> show binary logs;+---------------+-----------+| Log_name   | File_size |+---------------+-----------+| binlog.000001 |    198 || binlog.000002 |    154 |+---------------+-----------+2 rows in set (0.00 sec)mysql> show create table testcp \G;*************************** 1. row ***************************    Table: testcpCreate Table: CREATE TABLE `testcp` ( `id` int(11) NOT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.00 sec)ERROR: No query specified

主库:

mysql> insert into testcp values(20);Query OK, 1 row affected (0.43 sec)mysql> select * from testcp;+----+| id |+----+| 10 || 20 |+----+2 rows in set (0.01 sec)
mysql> show slave status \G;*************************** 1. row ***************************        Slave_IO_State: Waiting for master to send event         Master_Host: 192.168.99.41         Master_User: repl         Master_Port: 3340        Connect_Retry: 60       Master_Log_File: binlog.000002     Read_Master_Log_Pos: 417        Relay_Log_File: relay.000004        Relay_Log_Pos: 624    Relay_Master_Log_File: binlog.000002       Slave_IO_Running: Yes      Slave_SQL_Running: Yes...mysql> select * from testcp;+----+| id |+----+| 10 || 20 |+----+  

3、主库执行purge binary logs

做之前拷贝原有binlog.000001为binlog.000001bak因为等会要拷贝回去

mysql> purge binary logs to 'binlog.000002';Query OK, 0 rows affected (3.14 sec)mysql> show binary logs;+---------------+-----------+| Log_name   | File_size |+---------------+-----------+| binlog.000002 |    417 |+---------------+-----------+1 row in set (0.00 sec)

4、手动更改

将binlog.000001bak拷贝为binlog.000001,然后修改index file将binlog.000001加入回去,然后flush binary logs如下:

mysql> flush binary logs;Query OK, 0 rows affected (0.15 sec)mysql> show binary logs;+---------------+-----------+| Log_name   | File_size |+---------------+-----------+| binlog.000001 |     198 || binlog.000002 |    461 |+---------------+-----------+2 rows in set (0.00 sec)

就这样DUMP线程重新发送了一次binlog.000002的内容,POSITION的从库只能报错了如下:

mysql> select * from replication_applier_status_by_worker \G*************************** 1. row ***************************     CHANNEL_NAME:       WORKER_ID: 1      THREAD_ID: NULL    SERVICE_STATE: OFFLAST_SEEN_TRANSACTION: ANONYMOUS  LAST_ERROR_NUMBER: 1062  LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'ANONYMOUS' at master log binlog.000002, end_log_pos 386; Could not execute Write_rows event on table testmts.testcp; Duplicate entry '20' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 386 LAST_ERROR_TIMESTAMP: 2019-05-21 14:46:26

这种操作非常不规范,如果实在要这么做考虑如下步骤:

  • 关闭全部的从库
  • 手动注册binlog文件,修改index 文件
  • flush binary logs
  • 启动从库

这种情况下将会重新初始化DUMP的偏移量指针,应该是没有问题的。还是尽量不要考虑这么做。

更多相关文章

  1. Android(安卓)全局异常处理(二)
  2. [置顶] android 捕捉异常
  3. Android(安卓)异常后如何重启应用
  4. Android全局异常统一处理
  5. Android(安卓)webview Not allowed to load local resource异常
  6. 【Android】java.lang.SecurityException: Permission Denial: s
  7. Android(安卓)Studio使用过程中遇到的问题(一)-- "value 2"异常解
  8. Android(安卓)NullPointerException解决方法(空指针异常)
  9. Android(安卓)代码混淆异常 transformClassesAndResourcesWithPr

随机推荐

  1. 说说 Android 的常见 UI 控件
  2. Android事件分发/传递机制总结
  3. (一)Android事件分发机制 - View篇
  4. Android系列之浅谈Android 3D旋转
  5. 【Android Studio使用教程1】Android Stu
  6. Flash Android ANE打包之基本流程
  7. android - 为安全而设计 - 1 - 开发文档
  8. Android数据储存
  9. Android JNI开发入门之二
  10. Android软件安全与逆向分析