随机主键对InnoDB插入性能的影响

孔子说得好,“学而不思则罔,思而不学则殆”。看书看多了不去想不去做对比,很快脑子就会一片混乱。这篇blog稍微介绍下随机主键对InnoDB插入性能的影响。

InnoDB引擎的行结构相当复杂(compact、redundant、Barracuda)。为了弄明白这一个细节,我翻了很多遍官方手册。后来又对比着看了很多本书,发现他们自相矛盾,无奈之下,只能从简单的开始测起。下面先摘抄两段矛盾的描述:

《Pro MySQL》,169页:

Remember that InnoDB tables follow a clustered data organization where the data page is clustered, or ordered, based on the primary key value. Would it then surprise you to know that InnoDB does not actually store records in the order of the primary key?
“But wait!” you say. “How is it possible that a clustered data organization can be built on index pages without those records being laid out in primary key order?” The answer lies in the storage engine’s use of next-key pointers in the data records.
The designers of InnoDB knew that maintaining clustered index data pages in sort order of the primary key would be a performance problem. When records were inserted, the storage engine would need to find where the record “fit” into the appropriate data page, then move records around within the file in order to sort correctly. Updating a record would likewise cause problems. Additionally, the designers knew that inserting records on a heap structure (with no regard to the order of the records) would be faster, since multiple insertions could be serialized to go into contiguous blocks on the data page. Therefore, the developers came up with a mechanism whereby records can be inserted into the data page in no particular order (a heap), but be affixed with a pointer to the record that had the next primary key value.
The InnoDB storage engine inserts a record wherever the first available free space is located. It gets this free record space address from the page header section. To determine the next-key pointer, it uses the small,  ondensed page directory trailing section of the data page to locate the appropriate place to insert the primary key value for the inserted record. In this way, only the small page directory set of key values and pointers must be rearranged. Note also that the next-key pointers are a one-way (forward-only) list.

《High Performance MySQL 2nd》,117页:

If you’re using InnoDB and don’t need any particular clustering, it can be a good idea to define a surrogate key, which is a primary key whose value is not derived from your application’s data. The easiest way to do this is usually with an AUTO_INCREMENT column. This will ensure that rows are inserted in sequential order and will offer better performance for joins using primary keys.
It is best to avoid random (nonsequential) clustered keys. For example, using UUID values is a poor choice from a performance standpoint: it makes clustered index insertion random, which is a worst-case scenario, and does not give you any helpful data clustering.

是不是很崩溃?《Pro MySQL》介绍了原理;而《High Performance MySQL 2nd》反对了《Pro MySQL》的观点,而且还在之后的篇幅里面用一个并不太能说明问题的实验来证明自己的观点。算了,还是得靠自己……

1、建表:

CREATE TABLE `test` (
`a` char(32) NOT NULL default ”,
`b` varchar(5000) default NULL,
PRIMARY KEY  (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

加上一个varchar的大字段的原因是,如果InnoDB主键随机插入存在行数据的搬迁,那么性能下降将会非常明显。这里的MySQL采用了O_DIRECT的模式,并且将buffer_pool减少到500M,尽量减少内存中的数据以体现这一差别。

2、写两个单线程程序。一个产生递增主键并插入到test表内;另外一个产生随机主键并插入到test表内

测试结果是以下两个图:

asc-pk-thumb

rand-pk-thumb

事实证明,《Pro MySQL》对InnoDB的描述是不对的。不过《Pro MySQL》已经出版了很多年了,或许以前的InnoDB是按照他说的方式实现的也未可知。从测试结果来看,InnoDB(MySQL5.0.84自带)是真正采用了聚集索引,数据存放的物理位置与聚集索引相关。

觉得文章有用?立即: 和朋友一起 共学习 共进步!

猜您喜欢

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注

*

您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>