For a big data system, one problem is the data access efficiency, one more problem is that the data insertion is very slow. We had a service system, the data loading process would take 4-5 hours. This time consuming operation is risky since if the program is interrupted during the loading process, it might be rerun, this will be troublesome. So it's necessary to improve the insertion efficiency for big data systems.
Here we provide two optimization suggestions.
1. Combine multiple insert statement
Common insert statement use is:
- INSERT INTO `insert_table` (`datetime`, `uid`, `content`, `type`) VALUES ('0', 'userid_0', 'content_0', 0);
- INSERT INTO `insert_table` (`datetime`, `uid`, `content`, `type`) VALUES ('1', 'userid_1', 'content_1', 1);
If we write it as :
- INSERT INTO `insert_table` (`datetime`, `uid`, `content`, `type`) VALUES ('0', 'userid_0', 'content_0', 0),('1', 'userid_1', 'content_1', 1);
The modified version can improve the insertion efficiency. The reasons are :
- It recduces the SQL statement parse operation, it only needs to parse once, while the first one needs to parse twice
- The SQL statement is shorter than the first one, this will reduce the network transfer IO
2. Put insertions in one transaction
If we write insert statement as
- START TRANSACTION;
- INSERT INTO `insert_table` (`datetime`, `uid`, `content`, `type`) VALUES ('0', 'userid_0', 'content_0', 0);
- INSERT INTO `insert_table` (`datetime`, `uid`, `content`, `type`) VALUES ('1', 'userid_1', 'content_1', 1);
- ...
- COMMIT;
The reason why using TRANSACTION can improve the insertion performance is because when executing an INSERT statement, MySQL will start a transaction internally, the data will be inserted in that transaction. When we put insertions in an explicit created transaction, it will reduce the cost of multiple creating transactions and committing transactions, we only have one transaction, after complete inserting all records, we commit them once.
Also, Normally your database table gets re-indexed after every insert. That will affect the overall insertion efficiency. In a transaction, the table will re-indexed only all records are inserted and the commit statement is executed.
To be noted
1. There is limitation about SQL statement length, so make sure the length of SQL doesn't exceed the SQL length limit, can change the limit setting with max_allowed_packet, the default is 1M.
2. Limit the transaction size, if the transaction size is too large, it may affect execution efficiency. MySQL has a configuration option named innodb_log_buffer_size, when the size is exceeding this value, the data will be written into the hard disk, this will affect the execution efficiency.
Source : http://blog.csdn.net/tigernorth/article/details/8094277