Galera Cluster Performance Tips

Galera Cluster is a synchronous multi-master replication plug-in for InnoDB. It is very different from the regular MySQL Replication, and addresses a number of issues including write conflicts when writing on multiple masters, replication lag and slaves being out of sync with the master.

In this article we will see how to make some optimizations to our Galera Cluster.

Write-set Caching during State Transfers

Under normal operations, nodes do not consume much more memory than the regular standalone MySQL/MariaDB database server. The certification index and uncommitted write-sets do cause some additional usage, but in typical applications this is not usually noticeable.

Write-set caching during state transfers is the exception.

When a node receives a state transfer, it cannot process or apply incoming write-sets as it does not yet have a state to apply them to. Depending on the state transfer method, (mysqldump, for instance), the sending node may also be unable to apply write-sets.

The Write-set Cache, (or GCache), caches write-sets on memory-mapped files to disk and Galera Cluster allocates these files as needed. In other words, the only limit for the cache is the available disk space. Writing to disk in turn reduces memory consumption.

Customizing the Write-set Cache Size

You can define the size of the write-set cache using the gcache.size parameter. The set the size to one less than that of the data directory.

If you have storage issues, there are some guidelines to consider in adjusting this issue. For example, your preferred state snapshot method. rsyn,xtrabackup and mariabackup copy the InnoDB log files, while mysqldump does not. So, if you use mysqldump for state snapshot transfers, you can subtract the size of the log files from your calculation of the data directory size.

Note

Incremental State Transfers (IST) copies the database five times faster over mysqldump and about 50% faster than xtrabackup or mariabackup. Meaning that your cluster can handle relatively large write-set caches. However, bear in mind that you cannot provision a server with Incremental State Transfers.

As a general rule, start with the data directory size, including any possible links, then subtract the size of the ring buffer storage file, which is called galera.cache by default.

In the event that storage remains an issue, you can further refine these calculations with the database write rate. The write rate indicates the tail length that the cluster stores in the write-set cache.

You can calculate this using the wsrep_received_bytes status variable.

Determine the size of the write-sets the node has received from the cluster:

SHOW STATUS LIKE 'wsrep_received_bytes';

+------------------------+-----------+
| Variable name          | Value     |
+------------------------+-----------+
| wsrep_received_bytes   | 6637093   |
+------------------------+-----------+

Note the value and time, respective as $recv_1$ and $time_1$ .

Run the same query again, noting the value and time, respectively, as $recv_2$ and $time_2$ .
Apply these values to the following equation:

$writerate = \frac{ recv_2 - recv_1 }{ time_2 - time_1}$

From the write rate you can determine the amount of time the cache remains valid. When the cluster shows a node as absent for a period of time less than this interval, the node can rejoin the cluster through an incremental state transfer. Node that remains absent for longer than this interval will likely require a full state snapshot transfer to rejoin the cluster.

You can determine the period of time the cache remains valid using this equation:

$period = \frac{ cachesize } { writerate }$

Conversely, if you already know the period in which you want the write-set cache to remain valid, you can use instead this equation:

$cachesize = writerate \times time$

This equation can show how the size of the write-set cache can improve performance. For instance, say you find that cluster nodes frequently request state snapshot transfers. Increasing the gcache.size parameter extends the period in which the write-set remains valid, allowing the nodes to update instead through incremental state transfers.

Note

Consider these configuration tips as guidelines only. For example, in cases where you must avoid state snapshot transfers as much as possible, you may end up using a much larger write-set cache than suggested above.

Setting Parallel Slave Threads

There is no rule about how many slave threads you need for replication. Parallel threads do not guarantee better performance. But, parallel applying does not impair regular operation performance and may speed up the synchronization of new nodes with the cluster.

You should start with four slave threads per CPU core:

wsrep_slave_threads=4

The logic here is that, in a balanced system, four slave threads can typically saturate a CPU core. However, I/O performance can increase this figure several times over.

Parallel applying requires the following settings:

innodb_autoinc_lock_mode=2

You can use the wsrep_cert_deps_distance status variable to determine the maximum number of slave threads possible. For example:

SHOW STATUS LIKE 'wsrep_cert_deps_distance';

+----------------------------+-----------+
| Variable name              | Value     |
+----------------------------+-----------+
| wsrep_cert_deps_distance   | 23.88889  |
+----------------------------+-----------+

This value essentially determines the number of write-sets that the node can apply in parallel on average.

Note

Warning: Do not use a value for wsrep_slave_threads that is higher than the average given by the wsrep_cert_deps_distance status variable.

Dealing with Large Transactions

Large transactions, for instance the transaction caused by a DELETE query that removes millions of rows from a table at once, can lead to diminished performance. If you find that you must perform frequently transactions of this scale, consider using pt-archiver from the Percona Toolkit.

For example, if you want to delete expired tokens from their table on a database called keystone at dbhost, you might run something like this:

$ pt-archiver --source h=dbhost,D=keystone,t=token \
   --purge --where "expires < NOW()" --primary-key-only \
   --sleep-coef 1.0 --txn-size 500

This allows you to delete rows efficiently from the cluster.

Blog for Database and System Administrators

This blog is a collection of articles on analyzing and troubleshooting for all system and database administrators. These guides come from my experience, I hope they can help everyone.

Write-set Caching during State Transfers

Customizing the Write-set Cache Size

Setting Parallel Slave Threads

Dealing with Large Transactions

Related

Leave a Reply Cancel reply