Matt, Seong Uck Lee: 6월 2014

2014년 6월 22일 일요일

Memcached Replication (3)

This is the last article about Memcached Replication.If you missed previous two article, Read below article first for understanding.

http://seonguck.blogspot.kr/2014/06/memcached-replication-1.html

http://seonguck.blogspot.kr/2014/06/memcached-replication-2.html

Install

Download KMC source first from GitHub.
https://github.com/kakao/mysql_5.6.14_kmc

To build KMC, you need to install basic library needed for MySQL 5.6.
Especially there's a lot of system which lack of libaio-devel and ncurses-devel and cmake.
And you need to install libmemcached library and development package.

libmemcached-1.0.4-1.el5.remi.x86_64.rpm
libmemcached-devel-1.0.4-1.el5.remi.x86_64.rpm

Now you can build KMC, you should run cmake with a few options.

# cd mysql-5.6.14_kmc
# mkdir Release
# cd Release
# cmake .. \
'-DBUILD_CONFIG=mysql_release' \
'-DCMAKE_INSTALL_PREFIX=/usr/local/mysql' \
'-DWITH_INNODB_MEMCACHED=ON' \
'-DENABLED_LOCAL_INFILE=OFF' \
'-DHAVE_QUERY_CACHE=OFF' \
'-DOPTIMIZER_TRACE=OFF' \
'-DENABLE_DEBUG_SYNC=OFF' \
'-DENABLED_PROFILING=OFF' \
'-DWITH_ARCHIVE_STORAGE_ENGINE=OFF' \
'-DWITH_EMBEDDED_SERVER=OFF' \
'-DENABLE_DTRACE=OFF'

Of course you don't need to all options, but you should put "DWITH_INNODB_MEMCACHED" for Memcached plugin of MySQL.
After cmake, just run make && make install. After installing you can find built MySQL executables in the directory you specified as INSTALL_PREFIX.
Before starting MySQL server, you should run $MYSQL_HOME/scripts/mysql_install_db script to create default dictionary schema as normal MySQL server.

Creating Memcached related schema.

After starting MySQL server, you should create Memcached related schema to activate Memcached plugin.
Initialization script is located in your MySQL home directory. Just run it.
Actually this procedure also need to original MySQL Memcached plugin not only for KMC.

mysql> source $MYSQL_HOME/share/innodb_memcached_config.sql
mysql> use innodb_memcache
mysql> show tables

Check below three tables have created in innodb_memcache database.

cache_policies
config_options
containers

Now we need to initialize KMC basic schema.

mysql> USE innodb_memcache;
mysql> INSERT INTO `containers` VALUES ('default','kmc','kmc_template','k','v','f','c','e','PRIMARY');

mysql> CREATE DATABASE kmc;
mysql> USE kmc;

mysql> DROP TABLE IF EXISTS `kmc_template`;
mysql> CREATE TABLE `kmc_template` (
`k` varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL DEFAULT '',
`v` mediumblob,
`f` int(11) NOT NULL DEFAULT '0',
`c` bigint(20) unsigned NOT NULL DEFAULT '0',
`e` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 STATS_PERSISTENT=0;

## Don't need this default row anymore, But I am not sure.
mysql> INSERT INTO `kmc_template` VALUES ('1','DO-NOT-REMOVE',0,0,0);

On this procedure, table name must start with "kmc_" prefix.

Configurations

Lastly change my.cnf configuration file for KMC. Below options are not only for KMC but also memcached performance.
We don't use KMC as InnoDB or MyISAM engine together. So I changed InnoDB options minimally.
This is not so strict options(except kmc_connect_string and binlog-format), so you can change it for your system standardizations.

## InnoDB ------------------------
...
innodb_data_file_path = ibdata1:64M:autoextend
...
innodb_log_file_size = 32M
innodb_log_files_in_group = 2
innodb_log_buffer_size = 16M

## Memcached --------------------
daemon_memcached_option = '-m 20480 -p 11211 -c 80000 -t 8 -s /tmp/memcached.sock'
innodb_api_enable_binlog = 1
innodb_api_trx_level=0 ## READ-UNCOMMITTED
innodb_api_bk_commit_interval=1
daemon_memcached_r_batch_size=1
daemon_memcached_w_batch_size=1
innodb_api_enable_mdl=OFF

## Replication:Binary log -----------
server-id = 1
## Replication query is always idempotent
slave_exec_mode = IDEMPOTENT
## Memcached connection string :: with socket file
kmc_connect_string = '--SOCKET="/tmp/memcached.sock" --BINARY-PROTOCOL --NOREPLY --TCP-NODELAY --TCP-KEEPALIVE'
binlog-checksum = NONE
sync_binlog = 0
master_info_repository = FILE
relay_log_info_repository = FILE
sync_master_info = 0
sync_relay_log = 0
sync_relay_log_info = 0
slave_checkpoint_group = 100000
slave_checkpoint_period = 1000
max_binlog_size = 100M
expire_logs_days=1
expire_max_log_files = 15
binlog-format = ROW

daemon_memcached_option is Memcached plugin startup options and kmc_connect_string option is for SQL thread of slave MySQL server.
Now starting MySQL server and check the basic function on.
And you can check replication setup and the way to check replication status out is same as original MySQL server.

Testing Memcached and Replication

MySQL Memcached plugin support both binary and text mode protocol, so we can simply test memcached operation with telnet program.

[root@matt001 ~]# telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.

set matt 0 0 9
Seong Uck
STORED

get matt
VALUE matt 0 9
Seong Uck
END

delete matt
DELETED

get matt
END

quit
Connection closed by foreign host.

If you want to check whether socket file(/tmp/memcached.sock) is working correctly, you can use netcat utility. In this case you can't use telnet.

[root@matt001 ~]# nc -U /tmp/memcached.sock
set matt 0 0 9
SEONG UCK
STORED

get matt
VALUE matt 0 9
SEONG UCK
END

quit
[root@matt001 ~]#

If you want to check the replicated data, run GET operation on slave memcached after run SET on master memcached.
Of course you can inspect the binary log file through mysqlbinlog utility.

Memcached replication feature(Added feature) does't have unit test yet. So you should be carefull.

2014년 6월 21일 토요일

Memcached Replication (2)

Modifications

As written in the first article about Memcached Replication, need to change some of MySQL 5.6 source code.
Below diagram will explain how MySQL Memcached plugin works and what I modified.

(Blue line is original behavior of MySQL 5.6 Memcached plugin's Caching and Innodb-only mode, Bold brown line means my modifications)

And the number of the diagram means ...

InnoDB API will transfer the input data to InnoDB engine(Table) as configured with innodb_api_bk_commit_interval, daemon_memcached_r_batch_size, daemon_memcached_w_batch_size system variables. And after that InnoDB API will write changes to binary log. But this is not happened on Cache-only mode. I modified MySQL code as memcached changes is also written to binary log but not apply that changes to InnoDB engine on Cache-only mode Memcached plugin. On original MySQL 5.6 Memcached plugin, dictionary table(innodb_memcache DB) and container table needs for Caching and Innodb-only cache mode not for Cache-only mode. But after modification dictionary table and container table is gotten to need for binary log writting. Of course we will not store memcached data to container table, but need it's template for binary log writting.
On slave side, SQL thread have to apply relay log gotten by IO thread to MySQL Memcached plugin NOT InnoDB tables. But on original MySQL replication, SQL thread directly access InnoDB API not Memcached. I need a way to access to Memcached plugin. I implemented it as SQL thread bring up another memcached client (using libmemcached client library). I can use Non-blocking mode mode with libmemcached client library for fast relay log applying. Actually in original MySQL 5.6, libmemcached.so shared library also exist on your $MYSQL_HOME/lib/plugin, So I renamed it as "libmemcachedserver.so" so that install libmemcached RPM separately on the same system.
Repication SQL thread act as Memcached client have to run as fast as possible, so modified SQL thread's memcached client connect to Memcached socket file rather than 11211 TCP port. And worked as Non-blocking mode for fast replication. Actually SQL thread will apply relay log event as sync-mode on every 100th request so that slave can check replication apply error.

Another modifications

Originally, Memcached plugin never open both of TCP port and Unix domain socket at the same time, But I need both channel.
Copy Memcached plugin's status metric to MySQL server's status variables so that we can monitor memcached status with MySQL monitoring tool.InnoDB master thread will copy memcached status metrics to MySQL status variables every second.

mysql> show global status like '%kmc%';
+----------------------------------+--------------+
| Variable_name | Value |
+----------------------------------+--------------+
| Innodb_kmc_connection_structures | 2004 |
| Innodb_kmc_curr_connections | 2002 |
| Innodb_kmc_curr_items | 70638969 |
| Innodb_kmc_pointer_size | 64 |
| Innodb_kmc_threads | 8 |
| Innodb_kmc_total_connections | 5090 |
| Innodb_kmc_total_items | 778120599 |
| Innodb_kmc_bytes | 18648687816 |
| Innodb_kmc_bytes_read | 460650586044 |
| Innodb_kmc_bytes_written | 497924594841 |
| Innodb_kmc_cmd_get | 7781206118 |
| Innodb_kmc_cmd_set | 778120599 |
| Innodb_kmc_evictions | 498675673 |
| Innodb_kmc_get_hits | 1043938514 |
| Innodb_kmc_get_misses | 6737267604 |
| Innodb_kmc_limit_maxbytes | 21474836480 |
+----------------------------------+--------------+

Fix related memory leak of MySQL 5.6.14 and expiration time bug of Memcached plugin.
Remove every file sync call for fast processing (if possible).
Add expire_max_log_files system variables so that we can control total binary logs' size by file count.(This features only use binary log file's suffix number, so if you use frequent "PURGE LOGS" or other commands which switch binary log file, expire_max_log_files system variables will not work as expected)
Add Binlog_purge_failed system status variable for monitoring binary log purge

0 : Okay
1 : Set when MySQL server attempt to remove ACTIVE state binary log.
2 : SET when MySQL server attempt to remove USE state binary log.

Add kmc_connect_string system variable so that you can change how SQL thread connect to it's local memcached plugin. You can change it whenever you want because it is dynamic variable. But You have to "STOP/START SLAVE" for applying change.

Usable Memcached Operations

Unfortunately on KMC(Kakao MemCached), Some of memcached operations are not usable. It's because of Memcached replication characteristic (And have not tested it).

Usable operations

GET
SET
ADD
DELETE
REPLACE

Not implemented or Not tested

INCREMENT
DECREMENT
CAS

Performance

GET operations' performance is same as Original Memcached. But SET/ADD/DELETE operations need to written to disk (Binary log file). And there's a lot of complex processing are involed even though it's not sync mode. So SET/ADD/DELETE performance is dramatically lower than original memcached server.
But general purpose of Memcached server, SET/ADD/DELETE operations is not so many. Once data item is cached on Memcached, then GET operation is performed all the time.
If not (If there's a lot of SET/ADD/DELETE), that memcached server has no effect becuase cache ratio is low. And in this case Memcached server only add overhead (I think).

I also modified SQL thread work as parallel based on row, But MySQL 5.6 multi threaded replication make a lot of sync overhead(CPU overhead) among slave threads and coordinate thread.

MySQL Memcached plugin can process more SET/ADD/DELETE operations than normal MySQL's SQL statements, but you have to consider replication.
SQL thread of slave side is working as single thread, and this might be bottleneck.
On Intel X86 commodity server, I think 10k SET/ADD/DELETE operations are limit. it's the limit of replication (No replication delay status)
So I added expire_max_log_files system variable and Binlog_purge_failed system status variable. If you have enough memory, allocate some of memory for binary log directory using RamFS or RamDisk for SET/DELETE/ADD operation's performance. Added system variable and status variable will help RamFs usage management.

Some limitations

Container table name must start with "kmc_".
Can't use CAS, INCR and DECR operations.
KMC(Kakao MemCached) use two upper bytes of Memcached flags field for originated server_id. So this is not compatible with your memcached client.
KMC will convert your DELETE operation to SET operation with 1 second expire (because we can't use flags field on DELETE operation).

Download

https://github.com/kakao/mysql_5.6.14_kmc

Memcached Replication (1)

Purpose

A lot of applications have data which have to be stored persistently. And theses data is stored on RDBMS or NoSQL solutions.
Sometimes these applications need a lot of data search operations, but current RDBMS or NoSQL solutions can't serve this requirement.
RDMBS or NoSQL solutions do really complicated internal processing for applications' request, and they have to store data to disk slowest component of computer.
So we use memory cache solution like Redis or Memcached. But they also have weak points especially Memcached doesn't have repilcation features.
Redis, this cache solution also have some weak points, this is not why I am focusing Memcached in this article.
We use a lot of Memcached on our service already, and I have to make Memcached can replicate data to remote Memcached(like slave of mysql) without migrating to Redis.

Some people doesn't feel any needs for Memcached replication.
But replication feature is necessary for multi-idc synchronization. And multi-idc synchronization is need for disaster recovery or IDC location aware services.
Redis has replication features, but Memcached server has not. Redis can serve complex data types but Memcached not. On the other hand, Memcached has it's own advantages.
(I don't want to metion about "Why Memcached is better than Redis, and Why Redis better than Memcached", What I want to say in this article is there's still a lot of people use Memcached server).
Some company made mysql server's binary log parser and relay it to Memcached server of multi IDC. Because usually cached data items are originated from database server.
And this way, solve a lot of complexity of application.

Also our company(Kakao corp) has same needs for DR or location aware services.
But above binary log parsing method need a few change of sql statement of applications. It's not fully transparent from application.

MySQL 5.6 Memcached plugin

MySQL 5.6 is released during I am looking for memcached replication method, First time I saw the MySQL Memcached plugin it seems that it doesn't have no usability.
But after a few days, Suddenly I thought we use MySQL Server's Memcached plugin as standalone memcached server and easily have memcached replication. Because Memcached data change will be written to mysql binary log through MySQL(InnoDB API). MySQL Memcached plugin has below three cache policy.

Innodb-only
Caching
Cache-only

First of all, Memcached plugin act as whole memory operation. So we could not use Inoodb-only and Caching policy because memcached data will be stored in innodb finally on this two policy. We have to use Cache-only policy. But unfortunately Memcached data is never written to binary log because memcached data is not stored in InnoDB on Cache-only policy.
And this is only the story for binary log writting. But on slave side SQL thread will relay binary log contents to InnoDB only, Not memcached plugin.

If I changed MySQL as writting data change to binary log on master and apply change to memcached plugin on slave, Memcached replication can be possible.
This is not so easy task, but it would be best way to implement it. We don't have to implement whole replication features on Memcached and MySQL replication features is really stable.

EXT3 Huge File Remove

Sometimes peoples said "Overhead during dropping InnoDB table".
Dropping table need to scan buffer pool (Especially they need to scan twice in older version).

But I think a real overhead comes from file system.
So you might be already use XFS or EXT4 to avoid this.

Simply I tested 50GB file remove (unlink system call on Linux).

< Linux iostat during file remove (unlink on ext3 filesystem) >

My server has only 6 SAS disk (4==> RAID 1+0 used for mysql data directory, and remained 2 disk used as mysql log) and 1GB Raid controller cache.

In my test, removing 50GB file took 25 seconds. During this time, there's huge disk read iops as you can see above chart. And the whole time during file unlink, disk utilization is 100%.

As you can imagine, If you run drop table which has huge file size MySQL server can't handle use requests until file remove completely. Actually this is not so weird things. it's because of EXT3 file system architecture.

On the other hand, EXT4 has several features to overcome this kind of fragmentations.

Multiblock allocator
When ext3 appends to a file, it calls the block allocator, once for each block. Consequently, if there are multiple concurrent writers, files can easily become fragmented on disk. However, ext4 uses delayed allocation which allows it to buffer data and allocate groups of blocks. Consequently the multiblock allocator can make better choices about allocating files contiguously on disk. The multiblock allocator can also be used when files are opened in O_DIRECT mode. This feature does not affect the disk format.

Delayed allocation
ext4 uses a performance technique called allocate-on-flush also known as delayed allocation. That is, ext4 delays block allocation until it writes data to disk. (In contrast, some file systems allocate blocks before writing data to disk.) Delayed allocation improves performance and reduces fragmentation by using the actual file size to improve block allocation.

I quoted this from wiki about EXT4 (http://en.wikipedia.org/wiki/Ext4)

XFS also has this kind of optimization. So XFS and EXT4 have fewer fragmented blocks than EXT3. Sometimes MySQL server performance is dropped when removing binary log file internally (Default size of binary log file is 1GB, So we changed it to 100MB).

If you can't change file system to EXT4 or XFS, then you can use linux hard link + truncate command.

< Linux iostat during file remove (truncate file 1GB each 2seconds on ext3 filesystem) >

Tested program does just truncate last 1GB amount of contents of 50GB file and sleep 2 seconds. After that truncate last 1GB amount of contents of remained 49GB. so on...

Disk utilization is really stable. And this time user requests are never blocked.

InnoDB will just call unlink system call when you drop table. And unlink system call never drop when the target file has another hard link(man unlink). So you can make another hard link for huge ibd file before and run drop table, then InnoDB drop only just one link.
After that, you can truncate the hard linked file little by little.

shell> link /mysql_data/db1/huge_table.ibd /mysql_data/huge_table.ibd.dropped
mysql > drop table huge_table;
shell> ## doing ftruncate "huge_table.ibd.dropped" little by little

I'm doing this can be possible as MySQL builtin features.
Still I'm doing read and modify mysql code (Not done yet).

https://github.com/SunguckLee/MariaDB/commit/87f05c2619c714007d40b9c07e151cb51ef6eca6