MySQL Enterprise Backup Team is pleased to announce major improvements in incremental backup performance starting with release 4.1.
Introduction
The current incremental backup algorithm scans all the tables to gather changed pages even if very few tables are modified since the previous backup and thus results in a 'full-scan' incremental backup. This may result in increment backups requiring the same amount of time as full backup because it scans all the tables. The new algorithm aims to eliminate this extra time.
The new algorithm scans only those tables that have been modified since the previous backup. This algorithm relies on modification time, which is similar to an earlier improvement made for full backup. That full backup algorithm is known as optimistic full backup, hence new improvement is named ‘Optimistic Incremental Backup’. For comparison, we will use optimistic full backup to refer to the performance improvements for full backup and optimistic increment backup for the new improvements to incremental backup.
In the new optimistic incremental backup algorithm, we refer to the tables that are not modified since the previous backup as 'unchanged tables'. We refer to tables that are modified after the previous backup as 'busy tables'. The new optimistic incremental backup therefore scans only busy tables for changed pages.
However, there is one difference between optimistic full backup and the new optimistic incremental backup. For optimistic full backup, the user had to specify either the --optimistic-time or --optimistic-busy-tables options in order to identify the busy tables. In case of optimisitic incremental backup, no additional parameters are necessary because the algorithm identifies busy tables, which are clearly defined and identified.
How Optimistic Incremental Backup works
The first and foremost goal for MySQL Enterprise Backup (MEB) is quality and consistency. To achieve a consistent backup during optimistic incremental backup, MEB identifies a point in time against which the modification time of tables could be compared. MEB then acquires a read lock on the tables for very short span of time to copy the non-innoDB files. Since non-Innodb tables cannot be modified the lock period, MEB records a timestamp during the lock period which we might call the consistency time; the time that tables are consistent when the timestamp is recorded.
When the optimistic incremental backup starts, it compares the modified time of each table against the consistency time. If the modification time of a table is greater than the consistency time, that table has been modified after consistency time was recorded and optimistic incremental backup needs to scan that table for changed pages.
The above diagram depicts an optimistic incremental backup sample execution as the following.
- There are 6 tables to be scanned (and may be copied too) during previous backup operation.
- MEB notes the consistency time when tables are locked.
- Table7 is created after the tables are unlocked but backup is still copying the meta files.
- After the backup is finished, Table1 is updated and Table8 is created.
- When optimistic incremental backup starts, it looks for the consistency time from the previous backup.
- MEB compares the consistency time with the modification time of all the tables present in datadir. It finds that only three tables that have been modified after the consistency time. Hence, these three tables are scanned for changed pages. The remaining tables are unchanged tables and are ignored. If any unchanged tables are modified during optimistic incremental backup, the changes are recorded in the redo log file. These changes will be applied at the apply log (restore) phase.
Notes
The consistency time is stored in a column named consistency_time_utc in the backup_history table as well as in a field with the same name in the meta file backup_variables.txt.
If the --no-locking or --no-connection options are used during backup, the backup start time is recorded as the consistency time.
During an optimistic incremental backup, if MEB is unable to discover the consistency time from the previous backup, it defaults to the older incremental backup algorithm.
It is possible in some special cases, optimistic incremental backup may perform the same as the older incremental backup algorithm.
How to Trigger an Optimistic Incremental backup
The current --incremental option is extended to include the following values.
- optimistic: optimistic incremental backup algorithm is triggered if --incremental=optimistic is specified while taking the incremental backup.
- full-scan: the older incremental backup algorithm is triggered if --incremental=full-scan or --incremental is specified while taking the incremental backup. It scans all the tables that qualifies the backup operation. This is also the default algorithm to be used when no argument is given for the option.
The following are some examples.
Optimistic Incremental image backup using dir:directory_path:
>mysqlbackup.exe --backup-image=<image file name > --backup-dir=<temporary directory name> \
--incremental-base=dir:<previous backup directory> \
--incremental=optimistic backup-to-image
Optimistic Incremental image backup using history:last_backup:
>mysqlbackup.exe --backup-image=<image file name > --backup-dir=<temporary directory name> \
--incremental-base=history:last_backup backup directory> \
--incremental=optimistic backup-to-image
Optimistic Incremental image backup using dir:directory_path:
>mysqlbackup.exe --incremental-backup-dir=<incremental backup dir> \
--incremental-base=dir:<previous backup directory> \
--incremental=optimistic backup
Optimistic Incremental image backup using history:last_backup:
>mysqlbackup.exe --incremental-backup-dir=<incremental backup dir> \
--incremental-base=history:last_backup --incremental=optimistic backup
Performance Tests
In our internal tests, we created a 1TB database where 10 tables were present, each 100 GB in size. We executed three iterations and observed the results. In the first iteration (A2), we modified only 1 table. In the second iteration (A3), we modified 2 tables. In the third iteration (A4), we modified 3 tables. We compared those times with performing a full backup and an incremental backup using the old algorithm.
We found that theoretical gains (A2- 90%, A3 - 80%, A4 - 70%) almost match with the practical gains (A2- 89%, A3 - 79%, A4 - 68%). It means performance will be closer to default IB as the count of modified table increases, which is expected. For brevity, we use FB for full backup, IB for the old incremental backup algorithm, and OIB for the new optimistic incremental backup algorithm.
Tag |
Description |
Time taken (hh:mm:ss) |
% of Improvement [(Bi-Ai)/Bi*100 ] |
B1 |
FB (10 tables; 100GB per table) |
2:13:07 |
- |
B2 |
IB (one table changed) |
2:12:18 |
- |
B3 |
IB (two table changed ) |
2:12:12 |
- |
B4 |
IB (three table changed) |
2:12:11 |
- |
A1 |
FB (10 tables; 100GB per table) |
2:13:07 |
- |
A2 |
OIB (one table changed ) |
0:13:55 |
89 |
A3 |
OIB (two table changed ) |
0:27:43 |
79 |
A4 |
OIB (three table changed) |
0:42:13 |
68 |
Conclusion
Optimistic incremental backup will have significant advantages and benefits over the older incremental backup algorithm, especially in cases where changes are limited to small set of tables. We hope you'll give it a try and provide us feedback on how it works for your data.