Improved Redo Log Copying During Backup

MySQL Enterprise Backup Team is pleased to announce redo log performance improvements during the backup operation starting with release 4.1.

Introduction

As explained in Redo Logging in InnoDB, redo logs are written to the redo log files in a circular fashion. Unless the MySQL server is idle, it continues to generate redo logs. MySQL Enterprise Backup (MEB) copies the redo logs from the redo log files to the ibbackup_logfile during a backup operation. MEB copies the redo logs in order to ensure that it does not miss any transaction while the physical backup is being done. MEB continues to copy the redo logs until it reads all the files from the data directory. If the redo log records are written faster than they can be processed by MEB, the backup operation fails with the following error.

Log scan was only able to reach <LSN1>,

but a checkpoint was at <LSN2>.

This means that the database server has overwritten

a part of the circular InnoDB log files before ibbackup

was able to read it.

To solve this problem either:

1) re-run the backup when the database has less load, or

2) re-configure the database with bigger InnoDB log files.

We have improved MEB so that this problem is unlikely to happen in most cases.

Background

As explained in Parallel backup, MEB uses different types of threads during an operation. The Following diagram depicts the typical active threads that are working during a backup operation prior to the 4.1 release.

Main thread

This thread forks other child threads and monitors their execution.

Reader threads

Read the files in segments to be copied and pass them to the processor threads.

Processor threads

Do the required processing (for instance - compress, encrypt and validate) on the segments and pass the processed segments to the writer threads.

Writer threads

Write the segments to the target location.

Progress thread

Is forked only if the --show-progress option is specified. It prints the overall progress made by MEB.

Redo log thread

Continues to copy the redo logs until notified by a reader thread to stop once reader threads have finished reading the files. While multiple reader threads keep reading the files from the data directory, there is only one redo log thread that performs the following three tasks:

Reads redo log segments of 1MB size each from redo log files in server. If there is no redo log then go to sleep for 10 milliseconds and retry to read more redo log after wake up
Validates the redo log segments for unlikely problems (for instance – corruption and partial read) in the redo log segment.
Writes the redo log segments to ibbackup_logfile in the backup.

It is evident that the redo log thread does a lot of work, which could be split into multiple threads much like the reader, processor, and writer threads.

The improvement

In the 4.1 release, we have split the one redo log thread into three different threads as depicted in the diagram below.

Reader redo log thread

Reads redo log segments of 1 MB size each and passes them to the Processor redo log thread. MEB creates a 16 MB buffer queue to keep these redo log segments in order to pass them among the three redo log threads. If there is no redo log to read, then the thread sleep with a duration that varies between 10 milliseconds to 1 second based on the redo log generation pattern. This ensures that the reader redo log thread does not wake up too early and waste CPU cycles and to avoid a sleep duration that may cause the same problem which we have fixed through this improvement.

Processor redo log thread

Validates the redo log segments, and passes them to the writer redo log thread.

Writer redo log thread:

Writes the segments to the ibbackup_logfile in the backup.

Test Results

We performed a few tests in two different environments as follows. In both environments, redo log files in server were configured to be on a RAMDISK and backup was performed on a hard disk. This was done to simulate the use cases where backups are usually performed in the slower disks compared to the server is located. This configuration ensured that server could write the redo log files as fast as it could and we could identify the latency among the Server writing the redo logs, MEB reading, processing and writing the same.

Environment-01:

Server Repository Options:
- innodb_log_files_in_group = 2
- innodb_log_file_size = 512 MB
- innodb_log_buffer_size = 16 MB
Initial database size = 687 GB
After 30 seconds, sysbench clients connect to the server and insert one million rows/per tables
Number of tables were the same as the number of clients.
Redo write rate = ~60 MB/s

MEB version	Number of clients	Backup status
4.1	100	Completed successfully
4.1	500	Completed successfully
4.1	1000	Completed successfully
4.0.3	2	Failed with redo log overwrite issue

We kept the redo log files size small so that the redo log file switch would happen frequently. That may result in a case where MEB is not able to read as fast as the server was writing, but in the test, the case did not occur. That is, the threads completed every time using the three iterations mentioned above. In each iteration, there was approximately 150GB of data in the ibbackup_logfile.

Environment-02:

Server Repository Options:
- innodb_log_files_in_group= 6
- innodb_log_file_size = 4 GB
- innodb_log_buffer_size = 16 MB
Initial database size = 687 GB
After 30 seconds, sysbench clients connect to the server and insert three million rows/per tables\
Number of tables were same as number of clients.
Redo write rate = ~90 MB/s

MEB version	Number of clients	Backup status
4.1	2000	Completed successfully
4.0.3	2	Failed with redo log overwrite issue

Here, we kept the redo log files size larger to widen the gap between the server writing the redo log and MEB reading it from the redo log files. We increased the number of clients this time and noticed a higher redo write rate but backup completed without any problems. In this test, there was approximately 850GB of data in the ibbackup_logfile.

Improved Redo Log Copying During Backup

Introduction