MySQL 8.0: Query Optimizer Takes Data Buffering into Account
Mysql 21-Feb-2019

MySQL 8.0: Query Optimizer Takes Data Buffering into Account

In earlier versions of MySQL, the query optimizer did not distinguish between data that was cached in the database buffer and data that had to be read from disk. The main reason was that the optimizer had no information about whether a table would have to be (partially) read from disk or already was present in the buffer pool. In MySQL 8.0, this has changed. InnoDB will now provide buffer estimates per table and index.

The default values for the configurable cost constants that were added in MySQL 5.7, have also been changed to reflect different costs of memory and disk access:

 

The settings for these cost constants can be configured by updating the tables mysql.server_cost and mysql.engine_cost. To easier determine what are the default settings for these constants, we have in MySQL 8.0 added a column that will show the default value.

Example: DBT-3 Query 8

To show the benefits of condition filtering, we will look at Query 8 of the DBT-3 benchmark:

SELECT o_year,
       SUM(CASE WHEN nation = 'FRANCE' THEN volume ELSE 0 END) / SUM(volume) AS mkt_share
FROM (
    SELECT EXTRACT(YEAR FROM o_orderdate) AS o_year,
           l_extendedprice * (1 - l_discount) AS volume, n2.n_name AS nation
    FROM part
    JOIN lineitem ON p_partkey = l_partkey
    JOIN supplier ON s_suppkey = l_suppkey
    JOIN orders ON l_orderkey = o_orderkey
    JOIN customer ON o_custkey = c_custkey
    JOIN nation n1 ON c_nationkey = n1.n_nationkey
    JOIN region ON  n1.n_regionkey = r_regionkey   
    JOIN nation n2 ON s_nationkey = n2.n_nationkey
    WHERE r_name = 'EUROPE' AND o_orderdate BETWEEN '1995-01-01' AND '1996-12-31'
      AND p_type = 'PROMO BRUSHED STEEL'
) AS all_nations GROUP BY o_year ORDER BY o_year;

Query 8 is called National Market Share Query, and it finds the market share in Europe for French suppliers of a given part type. You do not need to understand this query in detail. The main point is that 8 tables are joined, and that it is important to find an efficient join order for the query to perform well.

Below we show two possible query plans for this query:

Plan A

Plan B

The good thing about Plan A is that it processes the part table early. This is good because the only high selectivity condition in this query is on part type. The drawback of plan A is that it uses a secondary index on the largest table, lineitem, while Plan B only use primary key indexes. Executing these query plans in MySQL 8.0.3 on a DBT-3 scale factor 10 database, gives the following execution times:

 

We see that when all data is in memory (innodb_buffer_pool_size=32G), Plan A is definitely the best plan. However, when the buffer pool is smaller (innodb_buffer_pool_size=1G), Plan B is better. This is because when using Plan A in a disk-bound scenario, the use of a secondary index on the lineitem table will give more random disk access. Also, when using a secondary index, the same table page may have to be read from disk multiple times since it may have been evicted from the buffer pool since the previous access.

If we look at what query plan the Query Optimizer will use for Query 8 in different versions of MySQL, we see the following:

  IN-MEMORY DISK-BOUND
MySQL 5.6 Plan B
MySQL 5.7 Plan A
MySQL 8.0 Plan A Plan B

 

In MySQL 5.6, Plan B was chosen for this query. This changed in MySQL 5.7 since the optimizer started take into account the filtering effect of conditions on non-indexed columns. Changing to Plan A reduced the execution time with over 90% when all data is in memory. However, in the disk-bound scenario, the execution takes 2.5 times longer than in 5.6. In MySQL 8.0 you get the best of both worlds. Plan A will be used when all data is in memory, and Plan B will be used when most data need to be fetched from disk.

Thank you for using MySQL !