MySQL主库xtrabackup备份导致应用异常

昨天有位同学问我：有台主库tps一下子跌没了，应用写入全部报错，她提供几个时间点给我。tps一下子跌没，我条件反射:莫非有全局读锁。开始排查
第一步：
查看数据监控 mysql thread running 23:47分左右开始上涨，到第二天01:17恢复正常。其他指标变化不是非常明显

又得到非常有用信息：因备库问题，在主库16:00启动xtrabackup，这时感觉找到问题的根源
第二步:
查看slow，一条子查询跑了23659S、flush跑了 5216S，这条子查询应该在19:00左右跑起，备份应该在23:45左右跑flush tables with read lock

# Time: 121225  1:09:54
# User@Host: xxx[xxx] @  [xxxx.xxxx.xxxx.xxxx]
# Query_time: 23659.989155  Lock_time: 0.000104 Rows_sent: 0  Rows_examined: 47501046480
SET timestamp=1356368994;
select id, parent_biz_order_id, buyer_id, status, type, gmt_modify from ttt where parent_biz_order_id 
in (select parent_biz_order_id from ttt where type='哈哈');
# Time: 121225  1:13:25
# User@Host: root[root] @ localhost []
# Query_time: 5216.430881  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
use mysql;
SET timestamp=1356369205;
FLUSH TABLES WITH READ LOCK;

第三步：
查看备份信息:备份脚本实际跑完（备份未成功）时间是在第二天00:02:15

+--------------+---------------------+---------------------+
| db_name      | start_time          | end_time |
+--------------+---------------------+---------------------+
| xxxxxxxxx | 2012-12-24 17:40:56 | 2012-12-25 00:02:15 |

从备份日志看出，23:46:24 xtrabackup触发flush tables with read lock，等了900秒后，因为close tables失败，最终导致备份失败。触发flush tables时间点+900S=xtrabackup脚本跑完时间end_time
备份脚本报错：
121224 23:46:24 innobackupex-1.5.1: Starting to lock all tables…
innobackupex-1.5.1: Error: Connection to mysql child process (pid=18366) timedout. (Time limit of 900 seconds exceeded. You may adjust time limit by editing the value of parameter “$mysql_response_timeout” in this script.) while waiting for reply to MySQL request: ‘FLUSH TABLES WITH READ LOCK;’ at /usr/bin/innobackupex-1.5.1 line 336.

xtrabackup源码解释：

# flush tables with read lockmysql_check();--没有问题
        mysql_lockall()if!$option_no_lock;--在执行flush tables with read lock时，mysql_send函数执行超时900S，备份失败
 
# timeout in seconds for a reply from mysqlmy$mysql_response_timeout=900;
 
sub mysql_send {my$request=shift;
 $current_mysql_request=$request;print MYSQL_WRITER "$request\n";mysql_check();$current_mysql_request='';}
 
sub mysql_lockall {$now= current_time();printSTDERR"$now  $prefix Starting to lock all tables...\n";
 
    mysql_send "USE mysql;";mysql_send "SET AUTOCOMMIT=0;";if(compare_versions($mysql_server_version,'4.0.22')==0|| compare_versions($mysql_server_version,'4.1.7')==0){# MySQL server version is 4.0.22 or 4.1.7mysql_send "COMMIT;";mysql_send "FLUSH TABLES WITH READ LOCK;";}else{# MySQL server version is other than 4.0.22 or 4.1.7mysql_send "FLUSH TABLES WITH READ LOCK;";mysql_send "COMMIT;";}write_binlog_info;write_slave_info if$option_slave_info;
 $now= current_time();printSTDERR"$now  $prefix All tables locked and flushed to disk\n";}

把整个过程连贯起来描述：

1、16:00 xtracbackup启动，因为xtrabackup会先copy innodb data file
2、19:00 查询sql跑起
3、23:45 xtrabackup加flush tables with read lock，跑成功这条语句要做三件事:1、加全局锁 2、关闭open tables 3、打上阻塞commit的flag。因为加全局锁成功，但在关闭open tables时，因为长查询，一直hang住
4、23:45以后，所有的写入都会被阻塞，应用不停创建连接，thread running升高
5、第二天1:09:54 sql跑完
6、第二天1:17:25thread running恢复正常

小结：还有几个问题没有解释，flush tables with read lock失败，thread running为什么没有在flush失败时间点下降？slow.log为什么记得5216S？

觉得文章有用？立即：和朋友一起 共学习共进步！

OurMySQL

MySQL主库xtrabackup备份导致应用异常

猜您喜欢

文章评论： “MySQL主库xtrabackup备份导致应用异常”

发表评论取消回复

猜您喜欢

文章评论： “MySQL主库xtrabackup备份导致应用异常”

发表评论 取消回复

发表评论取消回复