2014년 9월 17일 수요일

Problem of MySQL Online DDL

MySQL Online DDL

Online DDL(Online schema change) is new feature (every DBA most wanted ^^) of MySQL 5.6 and MariaDB 10.0.
Online DDL is processed through bewow 4 task in sequence.

1) InnoDB::ha_prepare_inplace_alter_table(..)
2) InnoDB::ha_inplace_alter_table(..)
3) InnoDB::ha_commit_inplace_alter_table(..)
4) mysql_rename_table(..)

MySQL server will allocate memory buffer for online transaction logging in phase 1. This buffer is used for logging row change requested from users.
This row log buffer is allocated a unit of as much as sort_buffer_size, and it would be grow up to innodb_online_alter_log_max_size.
In phase 2, ALTERing thread will copy old table's row to newly created table applied new schema.
When copying rows is done, MySQL will take exclusive lock for target table and applying row log buffer to newly created table.

There's a few failure case, But most unpredictable case (I think) is row log buffer overflow. 
MySQL server will rollback ALTER statement when row log buffer is not sufficient. If you spend a lot of time until row log overflow, your are just waisting your precious time.
Even worse, MySQL server will not tell you how much row log buffer is used and how much free is there. All we can do is just allocating big memory and waiting for mysql server's judgment.
And MySQL manual didn't say anything about how much memory need for online DDL ㅠㅠ.


How DML affected row log buffer usage

So I tested usage of online row log buffer simply.
All test scenario is simple like below and print out increment of row log buffer usage (Added small fprintf code and recompiled it).
----------------------------------------------------------------------
Session1 : ALTER TABLE tb_onlineddl ..
Session2 : INSERT INTO tb_onlineddl ..
Session2 : UPDATE tb_onlineddl SET ..
Session2 : DELETE FROM tb_onlineddl WHERE ..
----------------------------------------------------------------------



CASE-1 : Adding column

CREATE TABLE `tb_onlineddl` (
  `fdpk` int(11) NOT NULL AUTO_INCREMENT,
  `fd1` char(100) NOT NULL,
  `fd2` char(100) NOT NULL,
  `fd3` char(100) NOT NULL,
  `fd4` char(100) NOT NULL,
  `fd5` char(100) NOT NULL,
  PRIMARY KEY (`fdpk`)
) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=latin1

MariaDB [test]> alter table tb_onlineddl add fd6 char(100), lock=none, algorithm=inplace;

MariaDB [test]> insert into tb_onlineddl values (null, 'insert', 'insert', 'insert', 'insert', 'insert');
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> update tb_onlineddl set fd1='update' where fdpk=10;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

MariaDB [test]> delete from tb_onlineddl where fdpk=4;
Query OK, 1 row affected (0.00 sec)

>> DEBUG for INSERT :: online ddl row log buffer usage : Current: 519 , Total: 519,  Max: 134217728
>> DEBUG for UPDATE :: online ddl row log buffer usage : Current: 519 , Total: 1038, Max: 134217728
>> DEBUG for DELETE :: online ddl row log buffer usage : Current: 22 ,  Total: 1060, Max: 134217728
(in byte)

In this test, I learned that MySQL store whole record to row log buffer when I update just one column. But on DELETE statement, MySQL server store just primary key and some overhead bytes.

CASE-2 : Dropping column


CREATE TABLE `tb_onlineddl` (
  `fdpk` int(11) NOT NULL AUTO_INCREMENT,
  `fd1` char(100) NOT NULL,
  `fd2` char(100) NOT NULL,
  `fd3` char(100) NOT NULL,
  `fd4` char(100) NOT NULL,
  `fd5` char(100) NOT NULL,
  `fd6` char(100) NOT NULL,
  PRIMARY KEY (`fdpk`)
) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=latin1

Connection1 [test]> alter table tb_onlineddl drop fd6, lock=none, algorithm=inplace;

Connection2 [test]> insert into tb_onlineddl values (null, 'insert', 'insert', 'insert', 'insert', 'insert', 'insert');
Query OK, 1 row affected (0.01 sec)

Connection2 [test]> update tb_onlineddl set fd1='update' where fdpk=11;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0

Connection2 [test]> delete from tb_onlineddl where fdpk=2;
Query OK, 1 row affected (0.00 sec)

>> DEBUG for INSERT :: online ddl row log buffer usage : Current: 620 , Total: 620,  Max: 134217728
>> DEBUG for UPDATE :: online ddl row log buffer usage : Current: 520 , Total: 1140, Max: 134217728
>> DEBUG for DELETE :: online ddl row log buffer usage : Current: 21 ,  Total: 1161, Max: 134217728

In this test, not much difference from CASE-1. One difference is that UPDATE statement use a little less row log buffer when ALTER DROP COLUMN.


CASE-3 : Dropping column (With TEXT/BLOB type column) 

CREATE TABLE `tb_onlineddl` (
  `fdpk` int(11) NOT NULL AUTO_INCREMENT,
  `fd1` char(100) NOT NULL,
  `fd2` char(100) NOT NULL,
  `fd3` char(100) NOT NULL,
  `fd4` char(100) NOT NULL,
  `fd5` char(100) NOT NULL,
  `fd6` char(100) NOT NULL,
  PRIMARY KEY (`fdpk`)
) ENGINE=InnoDB AUTO_INCREMENT=131071 DEFAULT CHARSET=latin1

Connection1 [test]> alter table tb_onlineddl drop fd6, lock=none, algorithm=inplace;

Connection2 [test]> insert into tb_onlineddl values (null, 'insert', 'insert', 'insert', 'insert', 'insert', 'insert');
Query OK, 1 row affected (0.01 sec)

Connection2 [test]> update tb_onlineddl set fd1='update' where fdpk=10;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0

Connection2 [test]> delete from tb_onlineddl where fdpk=6;
Query OK, 1 row affected (0.00 sec)

CREATE TABLE `tb_onlineddl` (
  `fdpk` int(11) NOT NULL AUTO_INCREMENT,
  `fd1` char(100) NOT NULL,
  `fd2` char(100) NOT NULL,
  `fd3` char(100) NOT NULL,
  `fd4` char(100) NOT NULL,
  `fd5` text,
  `fd6` char(100) DEFAULT NULL,
  PRIMARY KEY (`fdpk`)
) ENGINE=InnoDB AUTO_INCREMENT=131072 DEFAULT CHARSET=latin1


>> DEBUG for INSERT :: online ddl row log buffer usage : Current: 527 , Total: 527, Max: 134217728
>> DEBUG for UPDATE :: online ddl row log buffer usage : Current: 425 , Total: 952, Max: 134217728
>> DEBUG for DELETE :: online ddl row log buffer usage : Current: 22  , Total: 974, Max: 134217728

In this test, Large object column (BLOB/TEXT) also use some bytes just needed.


CASE-4 : Adding column (With Index) 

CREATE TABLE `tb_onlineddl` (
  `fdpk` int(11) NOT NULL AUTO_INCREMENT,
  `fd1` char(100) NOT NULL,
  `fd2` char(100) NOT NULL,
  `fd3` char(100) NOT NULL,
  `fd4` char(100) NOT NULL,
  `fd5` char(100) NOT NULL,
  PRIMARY KEY (`fdpk`),
  KEY `ix_fd1` (`fd1`),
  KEY `ix_fd1_fd2` (`fd1`,`fd2`)
) ENGINE=InnoDB AUTO_INCREMENT=131073 DEFAULT CHARSET=latin1

Connection1 [test]> alter table tb_onlineddl add fd6 char(100), lock=none, algorithm=inplace;

Connection2 [test]> insert into tb_onlineddl values (null, 'insert', 'insert', 'insert', 'insert', 'insert');
Query OK, 1 row affected (0.02 sec)

Connection2 [test]> update tb_onlineddl set fd1='update' where fdpk=11;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0

Connection2 [test]> delete from tb_onlineddl where fdpk=8;
Query OK, 1 row affected (0.01 sec)

>> DEBUG for INSERT :: online ddl row log buffer usage : Current: 519 , Total: 519 , Max: 134217728
>> DEBUG for UPDATE :: online ddl row log buffer usage : Current: 519 , Total: 1038, Max: 134217728
>> DEBUG for DELETE :: online ddl row log buffer usage : Current: 22  , Total: 1060, Max: 134217728

In this test, Secondary index does not affect row log buffer usage. MySQL server will rebuild secondary index with clustered key(primary key which contains whole columns' value).

So we can assume that apply row log phase(this phase need exclusive lock on table) will be slower when there's many index on table.



How to estimate row log buffer usage during ALTER

Now we can simply estimate how much row log buffer is need for Online altering table.


CASE-1


  •   10 columns, average row size = 350 bytes
  •   Use innodb_online_alter_log_max_size = 128MB
  •   If online ddl statement is lasting 10 minutes, Maximum allowable row change (UPDATE+INSERT) is 633 (128MB/350Bytes/60sec/10min) / second.
  •   If online ddl statement is lasting 30 minutes, Maximum allowable row change is 211/second.
  •   If online ddl statement is lasting 60 minutes, Maximum allowable row change is 105/second.



CASE-2


  •   80 columns, average row size = 1.2KBytes
  •   Use innodb_online_alter_log_max_size = 128MB
  •   If online ddl statement is lasting 10 minutes, Maximum allowable row change (UPDATE+INSERT) is 183 / second.
  •   If online ddl statement is lasting 30 minutes, Maximum allowable row change is 61/second.
  •   If online ddl statement is lasting 60 minutes, Maximum allowable row change is 30/second.


I know, this calculation is really rough. But I can't find better solution.
Even though this calculation is accurate, but estimating how many rows are updated and inserted on specific table is not so easy.



TODO

I think the best solution is that MySQL server reporting row log buffer usage and estimation of table-altering time. 

But it's not so easy to make MySQL server reporting row log buffer usage to client like MariaDB (Because of MySQL 5.5 ALTER is processed in MySQL handler, but MySQL 5.6 online DDL is handled in each storage engine).

I think the next best solution is adding new global status variables like. ..
(Global status, so if there's two or more concurrent altering will not be covered)


  • Innodb_onlineddl_rowlog_rows :: row count stored in row log buffer
  • Innodb_onlineddl_rowlog_used :: row log buffer usage (in bytes)
  • Innodb_onlineddl_progress :: Online DDL progress (in copied-rows or %)



With this status variables, DBA can estimate whether his online altering is succeed or not.
I will blog if this feature is ready. ^^