Page 4 of 7

An Optimum In-place Merge Algorithm

\noindent In this report we consider the problem of merging two sorted lists of m and n keys each in-place. We survey known techniques for this problem, focussing on correctness and the attributes of Stability and Practicality. We demonstrate a class of unstable in-place merge algorithms that uses block rearrangement and internal buffering that actually does not merge in the presence of sufficient duplicate keys of a given value. We show four relatively simple block sorting techniques that can be used to correct these algorithms. In addition, we show relatively simple and robust techniques that does stable local block merge followed by stable block sort to create a merge. Our internal merge is base on Kronrod's method of internal buffering and block partitioning. Using block size of O(\sqrt{m+n}) we achieve complexity of no more than 1.5(m+n)+O(\sqrt{m+n}\log2(m+n)) comparisons and 4(m+n)+O(\sqrt{m+n}\log2(m+n)) data moves. Using block size of O((m+n)/\log2(m+n)) gives complexity of no more than m+n+o(m+n) comparisons and 5(m+n)+o(m+n) moves. Actual experimental results indicates 4.5(m+n)+o(m+n) moves. Our algorithm is stable except for the case of buffer permutation during the merge, its implementation is much less complex than the unstable algorithm given in \cite{Viliam Geffert et al} which does optimum comparisons and 3(m+n)+o(m+n) moves. Buffer extraction and replacement is the only additional cost needed to make our algorithm stable.
\end{abstract} More detailed explanation of these algorithms is available from the publication mentioned at the end of this page.

Key Terms: Internal Merge, Stable Merge, Kronrod's technique, Block Rearrangement, Internal Buffer,


\section{The Problem Definition}

Our two sorted lists to be merged are located in two arrays denoted as A and B, where |A|=m, |B|=n. Arrays A and B are parts of the larger array L defined as follows:A=L[0:m-1] and B=L[m:m+n-1]. We desire to stable merge A and B in-place using the locations in L as output for the final merge.

A current block is a block on which a merge operation is currently taking place.
A first block is the block that will be exhausted first during the merge of two current blocks in the local block merge operation. This block determine the next location of the moving buffer.
No need to locate blocks in series
A full block has k elements in it, where k is the block size

An Instance Illustration of the New Merge Technique

Assume that number of unmerged full blocks in A is the same as the number of unmerged full blocks in B. Also take the last value in each block as the mark.

Possible distribution of values in A and B at the end of merging the 2nd block in L

Assume that A1 is the next block with the smallest mark. The distribution of values in L at the end of merging the 3rd block is as illustrated below.

Possible permutation of blocks in L at the end of merging all blocks

Note that except for the last k size block in the A list and the permuted buffer, the values in each list at this stage are sorted. A single block exchange operation brings the buffer into a single k size block whilst at the same time bringing the two undersized merged blocks into a single full block. The buffer is then sorted using an optimum sort algorithm. A stable block sort on L excluding the buffer, creates a single sorted list as desired.

Permutation of L at the end of block sorting


Number of comparisons to place each element during local block merge is (n+m)-(k+s+s'+r+1)

Number of comparisons for block selection during local block merge is n = n1+n2-1, where n1 is the number of full blocks in A and n2, is the number of full blocks in B.

Number of data moves to place each element during local block merge is 3{(n+m)-(k+s+s'+r)} Number of comparisons during block sorting is "n(n-1) if a selection sort is used to sort the blocks.

Number of data moves is an additional 3{(n+m)-(k+s+s'+r)} Therefore, total comparisons from these two phases is 3{(n+m)-(k+s+s'+r)}+ n2. Total data moves is 6{(n+m)-(k+r+s+s')}. Note that we can reduce the number of data moves per element from 3 to 2 by using the technique, which explained later on.

Therefore, this reduces the total number of data moves to 4{(n+m)-(k+s+s'+r)}.

Comparisons and data moves respectively during buffer extraction are as follows (k+s+r) comparisons and no more than 3(k+2s+r) moves.


Merging with m+n+o(m) comparisons and 2(m+n) +o(m) Moves

We consider the problem of merging two sorted lists of m and n keys each in-place. We develop algorithms that uses m(t+1)+n/2t, m <= n, t=\lfloor\log2(n/m)\rfloor comparisons and no more than 3(m+n)+o(m+n) data moves for unstable merge. Implementation of our algorithms is simple, robust, less complex and faster than that given by Geffert et al who have derived the best known unstable algorithm so far that does m(t+1)+n/2t+o(n) comparisons and 3(m+n)+o(m) data moves. We show that with the use of a further constant extra memory k we can modify the algorithm to achieve optimum comparisons and 2(m+n)+4m/k+o(m) data moves for stable merge. We use our results to implement a stable internal merge-sort that does (1+2\log2(k)/k)N\log2(N)-N-1) comparisons and 2(1+2/k)N\log2 data moves to sort a list of N data items.

Key Terms: Internal Merge-sort, Sorting, Internal Buffer, Stable Merge, fastest internal merge, best merge algorithm

This internal merge operation is the fastest internal merge there is! It is used to implement an internal merge-sort that results in the fastest internal stable sort algorithm there is! The merge operation is done in a manner very similar to that of natural merge except that we are using Kronrod's ideas of block partition and the internal buffer.

Operational attributes of the algorithm that distinguishes it from previous algorithms are

* Block merge is done concurrently with block selection, to ensure correctness and simplicity of implementation.
* Encoding technique to locate displace blocks in their original sorted order during the merge.
* Clear explanation of operations and their complexity
* No block sorting is done and local block merge operation is stable

To do this, we compare the smallest values in A and B and then exchange the smaller of the two values with the value at the next output location in L starting at L[0]. Therefore, the values in locations occupied by A are moved to locations previously occupied by values in B and are permuted in a random manner as they are moved. Therefore, because values from either list are selected for output in a random order, the values in A are displaced sequentially but are place randomly in their displaced locations. The number of displaced A values at any instance in time equals the number of merged B values. Therefore, A values are displaced into locations previously occupied by B values. These displaced values are permuted.

To locate the next A value to be used in the next comparison after an A value has been output we need an algorithm that will find this value, preferable in constant or linear time. There is no algorithm that does this in constant time. An algorithm that does this in linear time uses additional information gathering operation and is generally not stable. Therefore, they add to the computational complexity of the merge operation, they do not select the next value from the permuted list in a stable manner and do not give an overall stable operation. During the merge operation, the displaced list can grow to a maximum of m. The displaced list then sw inks back to 0 at the end of the merge operation. If the length of the displaced list is lj at an instance in time, then the linear algorithm will do \fract{1}{2}lj comparisons on average to locate the smallest value in the list. This results from the fact that the displaced values are permuted at random during the merge operation. However, we show further on that this permutation of displaced values is predictable and can therefore be used to develop constant time algorithm to locate the next smallest value from the A list.

An Instance Illustration of the New Merge Technique


The first block is merged directly into its sorted location.

If we assume in this case that the mark of B1 is less than the mark of A1, then we choose the start of the next k buffer locations to be the 1st location in B1.
Therefore output of the next displaced A block i.e. A2 goes here.
Configuration of L at the end of merging the next full block is as illustrated below.

Blocks are merged directly into sorted location with buffer split between A and B.

We now assume that A1 has the smaller mark to that of B2. Therefore, we begin to place the elements of the next displaced A block starting with the first buffer location to the front of the elements in the remaining section of the first A block.

Configuration of L at the end of merging the next block in L is as illustrated below.

Displace A blocks are selected in original sorted order.

Configuration of L if the next selected block is B3

Possible configuration of L after all A blocks has been displaced from their original locations. At this point, a roll over counter in addition to the encoded information is used to keep tract of those blocks that are further displace from their original displaced location.

Roll over counter keeps tract of secondary displace A blocks as follows. If an A block, say Ar was at a displaced location t. If the value of t is less than the roll over counter value, then Ar did not encounter a secondary displacement and is therefore still at its original displaced location. Contrary to this, the new location of Ar is the original location plus the roll over counter value. The original location is obtained from encoded information.


Now reduced to n+m-1 and 3(n+m) comparisons and data moves respectively for merging and no more than 2nlog2(k) and 6n additional comparisons and data moves respectively for block encoding and decoding.
Additional Cost due to Buffer extraction
For unstable merge, this is approximately
For stable merge, this is


6. A stable optimum in-place merge with 2(n+m)+o(n+m) moves Modifications to reduce the number of comparisons/data moves Correctness
Complexity Analysis

In this case, buffer extraction, use and sorting are simplified. However, block encoding becomes more complex. No need for complex buffer extraction and replacement to achieve overall stability of the merge operation.

Only need 2 data move operations to merge each element. Number of comparisons remain the same except for block location encoding and decoding operations during the local block merge.

New Block Encoding Technique

Because in this case, the value of k is a constant independent of the values m and n, there may be more displaced A blocks to be encoded than there are elements in a block to implement this encoding. However, we note that the following invariant properties.

What about the encoding technique where there are sequence of equal values in a list and the sequence is greater than the block size. Ans: There is an invariant property that takes care of this situation quite easily. Can you figure this out?

To decode the location of t, do a binary search on the block boundaries using r/k as the first lower boundary.

We leave this to the reader as a useful exercise.

Using K-Way Merge to do Optimum Sort With O(N) Moves

We show that for optimum k-way merge of k pre-sorted lists, each consisting of m data values the number of comparisons is n\log2(k)-(k-1), n=km and the number of data moves is n with O(n) additional memory. We use the principles of (1) Block partitioning, (2) Data Encoding, (3) the Encoded Binary Selection Tree and (4) the Internal Buffer to develop an optimum algorithm that merges k pre-sorted lists in-place. For block size \nu=\sqrt{m}, our algorithm does no more than {(n-k-2)+\nu(2k-1)}\log2(k)+{(\nu(k-1)-k}\log2(\nu)+k-2 comparisons and 4n+8(k(\nu-1)-\nu) moves.

We use our basic $k$-way merge to implement an in-place merge-sort of $N$ values with the use of no more than N\log2(N)-N\lambda+N{\frac{\lambda}{k}+1+log2(k)}+O(log2(N)) comparisons and 2N(\lambda+4) data moves. Where \lambda=log2(N)/log2(k). We use additional constant memory to reduce the number of data moves to no more than 2N(\lambda+2). By setting k=N{\epsilon}, where 0<\epsilon <1, we improve the number of data moves to no more than 6N\epsilon+o(N) moves while maintaining about the same number of comparisons.

We consider our results to be the best for optimum comparisons and O(N) data moves so far to those given in the literature.

Key Terms: Bit, Binary selection tree, Block partition, Internal buffer, Internal merge, k-way merge, In-place and stable merge, Operational complexity analysis.

New Block Encoding Technique

Click NEXT button for Information about sorting in 2-dimension

Page 4 of 7