Upgrading your MySQL Server Farm
Mysql 19-May-2017

Upgrading your MySQL Server Farm

There are several advantages to keeping up to date with the latest major versions of MySQL. To provide a short example using MySQL 5.7:

  • Let application developers benefit from new functionality, e.g. JSON in 5.7
  • Use your hardware infrastructure more efficiently with parallel slaves in 5.7
  • Improve monitoring or management for DevOps with more performance schema instrumentation in 5.7
  • Improve security by using features like  transparent data encryption, firewall, auditing, and sha256 authentication in 5.7
  • Keep within the EOL window for access to maintenance releases and support

This blog post explains general activities in an upgrade project and what to look out for prior to an upgrade between major MySQL Server releases. I use the example of ‘web companies’ because they commonly have 24x7x365 requirements, but much of this advice applies to all deployment types of MySQL.

Our goal in the MySQL development team is to make upgrades as smooth as possible, with no downtime (or only minimal downtime) required.

Context

A typical situation for a Web company providing online services to its customers is to have a set of end user applications running on top of a backend MySQL Server farm. The server farm will typically use replication for redundancy and scale out. The Web company will want to eliminate or minimize service downtime while an upgrade is going on. Depending on size and complexity of the Server farm this can be a challenging and time consuming task.

Some Web companies will take an incremental test in production approach, perhaps start with setting up a single slave (new version)  taking in the production replication stream but without an application load. In this way they can assure that the new version is stable and handles the  replication stream coming from an old master. Another approach can be to upgrade a non critical part of their MySQL infrastructure and gain confidence that way. Web companies will typically also want to test the new Server version in a test environment and qualify it as much as they can before putting it into production.

In general, what you want is a number of learning exercises to build your knowledge and confidence in the next version. This helps you tweak your monitoring to handle the possible nuances in how it works compared to the previous version as a progressive exercise.

Setting up a Test Server

The easiest way to set up a test server is to start with an empty old (current) server and then restore a backup from the relevant production system. Sometimes data will first need to be obfuscated due to customer privacy concerns or rules. This will give you an old server with representative data.

The next step will be to perform the MySQL in-place upgrade procedure:

  1. stop server (old version)
  2. change binaries (new version)
  3. adjust configuration (my.cnf for new version)
  4. start server (new version)
  5. run mysql_upgrade (new version)
  6. restart server (new version)
This will give you a ready to use new server. (There are some caveats here which will be discussed below).
 

An alternative is to seed the test server with data from a logical mysqldump (not covered in this blog post). Inplace upgrade is normally preferred since it is faster. Dumping the data and loading it somewhere else again takes time, an inplace upgrade has the data ondisk and so the downtime is reduced to the time to stop the server, swap out binaries and run mysql_upgrade.

The final step is to test the new server version using test load from existing user applications. Some Web companies have reported that it would be nice to have T-split functionality in routers, i.e. sending a copy of the incoming user requests both to the real production system as well as to the test server. There is ongoing work called mirroring in ProxySQL to support such functionality.

User Applications

An upgrade from an old server version to a new server version has implications for user applications as well as for the server. There will be a time period where the server farm has both old and new server versions running, for example some old masters replicating to some new slaves during a rolling upgrade. Some time during the upgrade process the user applications will switch from talking to the old server version and start talking the new server version. Typically, this means that the user applications need to be prepared for both old and new server versions.

An example of a application difference between 5.6 and 5.7 is GET_LOCK(). Simon Mudd pointed out that we improved the functionality (good) but  but didn’t leave a backward compatible behavior (bad). There are two issues people can run into if they uses GET_LOCK(): First,  in 5.6 only a single simultaneous lock can be acquired and GET_LOCK() releases any existing lock.  Second, MySQL 5.7 enforces a maximum length on lock names of 64 characters. Previously, no limit was enforced. In both cases users might have code that depend upon the 5.6 behavior and need to adjust to the 5.7 behavior by changing application code.

As an aside, this particular incompatibility can be addressed by the query rewrite plugin. However, it still requires user applications to be aware that they rely on the old behavior.

Upgrading the Connector?

As a starting point one needs to check that the client side connector in use supports both server versions, and if not, upgrade to a connector which does support both versions. Upgrading to the latest connector is generally recommended to be able to benefit fully from added features. Note that each new server release comes with an up-to-date, forward and backward compatible C Client library (libmysqlclient). New versions of other connectors will typically be available at Server GA date or shortly thereafter.

There are also many non-Oracle supported drivers around which needs to be checked for compatibility with server versions like the go sql driver.

Qualifying the Applications for a new Server version

So, after possibly upgrading the connector the next step is to qualify and potentially change user applications to work with the new server version. Web companies will typically test the set of existing user applications with the new server version. In most cases they will work out-of-the box because MySQL tends to be backward compatible. But Web companies will also read the release notes to check for things they should be aware of and things they should test in more depth.

Things that might cause problems for existing user applications:

  • Removed syntax. In general, some syntax might be legal in the old release but illegal in the new release. Historically MySQL has been very conservative and not removed any SQL syntax to keep compatibility between versions. In the future we will likely remove some non standard SQL syntax where it makes sense, for example in MySQL 8.0 we will remove “\N” as a synonym for NULL. In MySQL 5.7 we deprecated a number of GIS functions and introduced aliases to become standards compliant. In the future we expect to remove the old functions.
  • Removed data type. Some data types might exist in the old version but being removed from the new version, for example the YEAR(2) data type that was deprecated in MySQL 5.5 and removed in MySQL 5.7.
  • Data restrictions. Some data values might be legal in the old version and illegal in the new version, for example NULL value being allowed in old version but not allowed in new version. Such changes are rare but can happen to align with SQL standard or to fix critical bugs.
  • New keywords. Some words might be legal user defined identifiers in the old version but are reserved words (keywords) in the new version.
  • Changed behavior. Some statements might be legal in both old and new versions but behave differently in the old versus new version. We only make such changes to fix critical bugs, going from incorrect behavior to correct behavior.  An example is INSERT DELAYED that was deprecated in MySQL 5.6. In MySQL 5.7, the server recognizes but ignores the DELAYED keyword, and handles the insert as a non delayed insert.
  • Change in defaults. Because of MySQL’s concept of SQL modes, there can exist situations where the same behavior exits both in old and new servers, but the default settings might have changed in the new version. If so, the old behavior can be reestablished by setting the SQL mode explicitly.

Note that such (potential) changes are also relevant for Statement Based Replication (SBR). If you are doing a rolling upgrade using SBR you will have the situation with an old master and one or more new slaves. All your write load will then first be executed on the old master, the statements will then be shipped to the new slave and re-executed by this new server version. The normal case is that the statement executes correctly on both master and slave (despite differences in major versions). If the functionality is removed from the new version slave then it will execute on the master but fail on the slave. In the worst case it executes both on master and slave but with a different results (causing data drift). In some cases, the simple fact  that the engine may return rows in different order may make things non-deterministic (think: “UPDATE … WHERE … LIMIT 1”). This can be a hard problem to chase down and is the main reason why we advocate extreme care when doing rolling upgrade using SBR. This is also why we recommend Web companies to move to Row Based Replication (RBR), as a means to protect against non-deterministic re-executions.

Although explained in the release notes it can sometimes be hard to identify all relevant changes, and it can be useful with additional sources of information such as blog posts specifically targeting upgrade between given releases.

Replication & Rolling Upgrade

Most Web companies will use rolling upgrades when upgrading the server farm. In its simplest form there will be one master and many slaves running the old version. You will need some redundant capacity since you take out one server at the time. Rolling upgrades means that you stop one slave, upgrade it to the new version with in-place upgrade (offline), start the new version and let it catch up from the master binlog until it is up to date, and then finally make it available for user load. Then you do the same with the next slave and so on until all slaves are new. When done and everything is stable you stop the master and promote one of the slaves to become the new master (failover). You may then upgrade the old master to become a new slave. There are variations of the procedure, whether your topology is more or less complex, like chains of servers, etc. But the principle is the same.

When it comes to replication the main mechanism in action here is that newer servers understands binlog records from older servers. This is a property MySQL has maintained in the past and will likely maintain in the future. But note the difference between binlog records themselves and the content of those records. For example, as mentioned above, the content might be SQL statements in SBR, and although they might be transported over to a new slave just fine, the new slave might not understand the SQL statement coming. Another example — which is relevant also for row based replication (RBR) — is the JSON data type which has a binary format, thus, both the old master and the new slave needs to have a common understanding of the JSON binary. Also note that, for RBR, the storage engine row record format is independent of the binlog record format. This means that even though the binlog format and protocol is stable one still need to watch out for potential changes to the row record format.

In-place Upgrade Revisited

As mentioned above, the MySQL in-place upgrade procedure is to stop the old server, replace the binaries with the new version, start the new version, run the mysql_upgradescript, and finally restart the server.

When you start the new server the first time it will look at startup options, read configuration files with their configuration variable settings, and read meta-data like FRM files and the system tablespace. Essentially, the new server version reads disk artifacts used and produced by the old version. If some startup options or configuration variables have been removed in the new version you might need to edit your customized startup script or your configuration files. Disk artifacts like FRM files or tablespaces are  backward compatible by design (sometimes requiring extremely hard work by MySQL Engineers) so that a newer server will be able to understand an older disk artifact.

The next step is to run the mysql_upgrade script which runs a set of SQL statements on the new server instance. In most cases the SQL statements are about making changes to the system tables (meta-data changes), like ALTER TABLE, UPDATE, INSERT. The mysql_upgrade script also installs correct Performance Schema and SYS Schema tables, views, stored routines, etc. In rare cases it might also convert user data, a lesson learned when we introduced the new timestamp format in MySQL 5.6. The new timestamp format was able to store micro seconds precision, and for this we needed a new on disk storage format. In 5.610 GA we kind of got it right, the ALTER TABLE in MySQL 5.6.10 could alter a table such that the result had temporal columns in both 5.5 and 5.6 format. Then, in an attempt to speed up conversion to the new format, we changed the behavior of ALTER TABLE to convert temporal from old to new format in 5.6.16 (bad!). We got promptly arrested by Simon Mudd who tried it and reported time spent in mysql_upgrade could take days with high data volumes. Finally, we introduced the possibility to choose in 5.6.26 by introducing the  system variable avoid_temporal_upgrades ON/OFF (OFF by default). The end result is that users who have upgraded from 5.6 to 5.7 might have timestamps with both old and new formats if they have turned this option ON. The cost of this flexibility is that MySQL carry the problem of two disk formats with us into the future. In 8.0 we are evaluating to only support the new format. If so, users will need to convert while they are on 5.6 or 5.7, before attempting to upgrade to 8.0.

So, the speed of upgrading a single instance is important, especially in a rolling upgrade scenario. With a rolling upgrade the upgraded server will need to do catch up from the master binlog. If the upgrade is time consuming the master binlog might wrap over or running out of space, making it impossible for the slave to get relevant binlog records in the catch up phase. A long upgrade causes a large replication tail.  If it takes too much time, the “wannabe” slave is unable to catch up in a realistic time.

The initial startup is reasonably fast, especially if you did a clean InnoDB shutdown. In the case of no clean shutdown then standard InnoDB recovery processing takes place, which might take some time depending upon the amount of undo and redo log records which need to be processed. Running the mysql_upgrade script is also reasonably fast in the cases where it only updates meta-data. However, in situations where  the mysql_upgradescript has to update user data it can become very time consuming, especially if it needs to build a new big table  to do the conversion. We experienced this in 5.6 with the new timestamp format. Lesson learned was that time matters and try to avoid such a thing in the future.

Changes to Configuration Variables and Defaults

Configuration variables might have been removed in the new version or their default value might have changed.

MySQL has the tradition of being very conservative with respect to removing configuration variables (some would say “too conservative”). And we try really hard to never change the semantics of an existing variable. But we think it is a good general goal to reduce the number of configuration variables. Our goal in MySQL development is to document removed variables and be clear about motivations and implications.

Morgan Tocker has been blogging about proposed changes to defaults, with general guidelines for new defaults, for example “Must work on virtual machines and cloud instances out of the box”. We think that it is important to have good defaults for most people, most of the time.

However, in the context of an upgrade the Web company will often prefer to decouple the upgrade from change in behavior (as much as possible). The reasoning will be “let me first worry about the upgrade, then gradually introduce changes” and “I can hopefully introduce the changes one by one without downtime”. The first part can be addressed by MySQL by always allowing the reverse setting, thus allowing the DevOp to explicitly set the old value (old behavior). The second part can be addressed by MySQL by allowing all relevant variables to be dynamic, that is, settable without restarting the server.

Morgan Tocker has also put out a set of compatibility configuration files on Github which can be used as a resource when you are planning your upgrade.

Impact on Operations & DevOps

In addition to the impact on the upgrade procedure and on user applications, and upgrade can also have implications for the operational environment, like integration with the operating system such as systemd on Linux,  Puppet, security settings, resource control, monitoring, management, etc. We hope to be good at documenting such impacts in our user documentation.

Preparing for the Next Major Upgrade

There are changes you can do on your current system to prepare for the next major upgrade.  A general advice is to look for deprecated features. Ask “do I use a deprecated feature” and if yes do what is necessary to manage without it. For example while you are on either 5.6 or 5.7 we recommend to move away from the old and onto the new  timestamp format. Another recommendation is to move to GTID while you are on 5.7. This can be done online once 5.7 is installed everywhere (but not prior to that).  Doing things like this up front will generally simplify your next major upgrade.

Conclusion

Upgrading to new major MySQL versions is a must for Web companies. Upgrades can be perceived as being time consuming and risky by those responsible, typically the DevOps. We in MySQL Development can make a big difference by making MySQL upgrade friendly and by providing accurate and easy to understand upgrade documentation. This blog post contributes to this overall goal by explaining important concepts and potential pitfalls.  My intention is to follow up with blogs posts addressing specific releases, such as upgrading from 5.7 to the upcoming 8.0.

Happy upgrading! And THANK YOU for using MySQL!