Setting up MySQL replication is a good practice. Is it good enough for you? The answer is ‘NO’. You need to continually monitor your slaves to ensure that they continue to work seamlessly. Here’s an overview of the Slave variables to be monitored and the tool that will enable you to monitor them with ease.
Top variables to monitor on your slaves
Replication is best monitored by checking the following variables:
SLAVE_RUNNING: This is a global status variable and its value can be checked using SHOW GLOBAL STATUS like ‘slave_running’. It can either be ‘ON’ or ‘OFF’.
If slave_running is ‘ON’, then the slave is up and working fine, which means both the SQL thread and the IO thread are running. If either the SQL thread or the IO thread is not running then this variable would be ‘OFF’. Use can use SHOW SLAVE STATUS and try to determine whether there was an error or if the slave was stopped manually. Checking the last 100 lines of the slave’s error log should give a clue. Looking at Last_Error_Number and Last_Error_Message for specific error information will help you fix your slave.
The following variables are a part of SHOW SLAVE STATUS.
SLAVE_IO_RUNNING: It tells you if the Slave’s IO thread is able to connect to its master and if it is running fine. The possible values for this variable can be ‘YES’, ‘NO’ or ‘CONNECTING’.
If this variable reads ‘NO’ then you will have to check the Last_Error_Number and Last_Error_Message and fix your slave. As of MySQL 5.1.20, these columns are aliases for Last_SQL_Errno and Last_SQL_Error. Before 5.1.20, they indicate the error number and the error message returned by the most recently executed statement. An error number of 0 and message of the empty string mean “no error.”
SLAVE_SQL_RUNNING: It tells if the Slave’s SQL thread has started and is working fine. The possible values of this variable are ‘Yes’ or ‘No’.
If this variable reads ‘No’, then the IO thread was caused to stop. You will have to check theLast_SQL_Errno and Last_SQL_Err for more information on the cause. An error number of 0 and message of the empty string mean “no error.” The Last_SQL_Error appears in the slave’s error log.
SECONDS_BEHIND_MASTER: As the name suggests, this field tells how late your slave is. In other words, it tells the time in seconds that the Slave’s SQL thread lags while processing Master’s binary log. A continuous increase in this value is not a very good sign as it means that the slave is not able to catch up with its master. There is no threshold value for this variable to compare against, to determine if the value is high or low. It completely depends on your application, network speed etc.NOTE: Although “seconds_behind_master” is the best option available for determining slave lag available in all MySQL versions, it has been criticised for not always being accurate.
MySQL 5.5 version has a status variable MASTER_HEARTBEAT_PERIOD which when set, will send beat packages to the Slave. After the loss of a beat, the Slave IO thread will disconnect and try connecting again. Various solutions for adding a ‘heartbeat’ mechanism have been proposed and patches and plugins are available for MySQL < 5.5. If you have added such ‘heartbeat’ mechanism you should monitor that as well.
A “MySQL DBA in a box” by Webyog is an agentless GUI based tool that helps MySQL DBAs manage MySQL servers. Replication monitoring and managing include-‘Replication’ tab that gives a topological view of all the Masters and their Slaves along with the SHOW SLAVE STATUS and SHOW MASTER STATUS.
Auto-registering of slaves, given the Master details.
Notification alerts through emails/SNMP traps that are sent out in the event of replication failure.