MySQL排序原理分析

    下面是介绍mysql如何排序的一篇文章:

How MySQL Does Sorting (filesort)

MySQL has two filesort algorithms for sorting and retrieving results. The original method uses only the ORDER BY columns. The modified method uses not just the ORDER BY columns, but all the columns used in the query.

The optimizer selects which filesort algorithm to use. Prior to MySQL 4.1, it uses the original algorithm. As of MySQL 4.1, it normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm.

The original filesort algorithm works as follows:

  1. Read all rows according to key or by table scanning. Rows that do not match the WHERE clause are skipped.
  2. For each row, store a pair of values in a buffer (the sort key and the row pointer). The size of the buffer is the value of the sort_buffer_size system variable.
  3. When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all pairs fit into the sort buffer, no temporary file is created.)
  4. Repeat the preceding steps until all rows have been read.
  5. Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.
  6. Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left.
  7. On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file.
  8. Read the rows in sorted order by using the row pointers in the result file. To optimize this, we read in a big block of row pointers, sort them, and use them to read the rows in sorted order into a row buffer. The size of the buffer is the value of the read_rnd_buffer_size system variable. The code for this step is in the sql/records.cc source file.

One problem with this approach is that it reads rows twice: One time when evaluating the WHERE clause, and again after sorting the pair values. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.)

The modified filesort algorithm incorporates an optimization such that it records not only the sort key value and row position, but also the columns required for the query. This avoids reading the rows twice. The modified filesort algorithm works like this:

  1. Read the rows that match the WHERE clause.
  2. For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query.
  3. Sort the tuples by sort key value
  4. Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.

Using the modified filesort algorithm, the tuples are longer than the pairs used in the original method, and fewer of them fit in the sort buffer (the size of which is given by sort_buffer_size). As a result, it is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the optimization is used only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is that you should see high disk activity and low CPU activity.)

    以上两种方法,一种是老方法,另外一种是改进的方法。两种方法各有自己的优缺点。第一种方法,对sort_buffer_size的占用主要是排序的字段以及一个row pointer,这种方式对排序内存的占用较少,可以有效的减少磁盘排序,但坏处也相当的明显,要对表进行两次扫描,第二次是要根据排好序的字段的row pointer指针回表取数据,这种方式,随着排序行数的加大,对时间的消耗将会变得无法忍受(类似于oracle的nested loop)。老方法受到两个参数的影响:sort buffer size以及read_rnd_buffer_size。

    如果采用改进型方法,这种新方法,只会扫描表一次,由于在一次扫描表时,取出了查询所有需要的字段,不仅包括排序字段,全都要放在排序内存(sort buffer)里,对排序内存的需求量比较大一些,如果是对少量结果集进行排序,这种新方法与老方法的性能差异并不会太明显(此前提是在并发量较低的情况下,如果这样的排序的SQL执行频率较高,那么新方法相对老方法,优势也比较明显),因为老方法在经过第一次数据扫描后,需要排序的少量的行都在内存里,第二次也只是到内存里取,这个速度相对来说,还是比较快的。如果是大量行的排序,新方法的优势便会体现出来。

    对于少量行的排序,这个工作,还是交给应用层来做最合适,也多不了几行代码。

觉得文章有用?立即: 和朋友一起 共学习 共进步!

猜您喜欢

6 thoughts on “MySQL排序原理分析

  1. Mayware Formula IV / 4 owners manual, service manuals and schematics are
    for reference only and the Vinyl Engine bears no duty for errors or other inaccuracies.

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注

*

您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>