2014년 11월 22일 토요일

Why TokuDB need to redo log sync on internal XA

TokuDB has option whether redo log flush(sync) after each transaction or not like InnoDB. The option name is tokudb_commit_sync. If you set tokudb_commit_sync=OFF, TokuDB will not sync redo log or doing that periodically based on tokudb_fsync_log_period option. 

Usually we use deferred redo log sync mode. Redo log sync mode need a lot of disk write IO. But there's a lot of service where data is not so important (1~5 seconds data loss is allowed). And semi-sync replication or gallera-cluster doesn't need redo log to be synced on every commit.

But current version of TokuDB (now 7.5.2), enabling binary log is very expensive (even tokudb_commit_sync is OFF). TokuDB's redo log commit mode will be changed ON(sync on every transaction commit) automatically when your server's binary log is activated.
TokuTek says it's because of internal XA(Two-phase commit) between Binary log and TokuDB storage engine. But I can't understand why redo log sync is need for internal XA (Actually binary log is not synced when TokuDB do XA, only TokuDB redo log)
But InnoDB doing async mode flush(sync) both redo log and binary log. Why only TokuDB need to sync or redo log ?

I don't know TokuDB's internal story. Anyway I changed TokuDB redo log will be flushed async even though binary log activated (of couse tokub_commit_sync=OFF).
No weird things happened on my simple crash scenario like server failure and MySQL server failure (Of course tokudb_commit_sync is OFF, so last few second's data is lost). And also replicated data too.
But we can't simply disable TokuDB XA feature. Because it makes binary log and TokuDB redo stored different order. So your slave can different data from master.

Binary log
  UPDATE account SET money=money*10 WHERE id=?;
  UPDATE account SET money=money+100 WHERE id=?;

Redo log
  UPDATE account SET money=money+100 WHERE id=?;
  UPDATE account SET money=money*10 WHERE id=?;

Actually I heard that recent version of TokuDB added parameter for disabling XA. What makes TokuDB so strictly need redo log sync on XA ?
So I asked about this on tokudb-dev google groups, But I have not heard the reason. Below is my question I wrote on tokudb-dev groups.

-------------------------------------------------
I have question about TokuDB internal two – phase commit of MySQL(+TokuDB).
According to TokuDB ft-index source code (https://github.com/Tokutek/ft-index/blob/master/ft/txn/txn.cc),

toku_txn_prepare_txn() and toku_txn_commit_txn() have a little bit different sync mode.

void toku_txn_prepare_txn (TOKUTXN txn, TOKU_XA_XID *xa_xid) {
….
txn->do_fsync = (txn->force_fsync_on_commit || txn->roll_info.num_rollentries>0);
….
}

// toku_txn_commit_txn() –> toku_txn_commit_with_lsn()
int toku_txn_commit_with_lsn(TOKUTXN txn, int nosync, LSN oplsn,
TXN_PROGRESS_POLL_FUNCTION poll, void *poll_extra)
{
….
txn->do_fsync = !txn->parent && (txn->force_fsync_on_commit || (!nosync && txn->roll_info.num_rollentries>0));
….
}

As you can see, toku_txn_commit_txn() take into account nosync parameter and nosync parameter is actually determined by tokudb_commit_sync system variables. So If user set “tokudb_commit_sync=OFF” then toku_txn_commit_txn() is not call fsync() for toku redo log.
But toku_txn_prepare_txn() is not take into account this system variable. So toku_txn_prepare_txn() always call fsync() for tokudb redo log even though user set “tokudb_commit_sync=OFF”.

Is there any reason that prepare() does always call fsync() for tokudb redo log (on tokudb_commit_sync=OFF configured TokuDB) ?


This is related with below thread and we already talked about this issue a few months ago.
-------------------------------------------------

And still we are waiting for the answer.