Block Rearrangement and the Internal buffer

Block rearrangement consists of chopping up the two pre-sorted lists to merge into k-size blocks, where k is the number of locations in each block. In the example below, the block size is set at $k=\sqrt(m=16)=4$. A given location in each block is selected and the blocks are rearranged in sorted order of the value that is at this selected location. The selected location is called the mark of each block. For example, if we have the following 2 pre-sorted lists to be merged. Assume that the two pre-sorted lists A and B occupies smaller sections of a larger list L. The values in A occupies locations L[1] to L[16] and the values in B occupy locations L[17] to L[32] in this instance. Each block in L consists of four elements.


We group sequential elements of 4 as individual blocks in each list. We choose the last element location in each block as the mark. We then place the sequence of values in each block at selected locations base on the value at their mark. We use the expression "block sorting" to refer to this process. At the end of stable block sorting the values in L for this instance of A and B is as given in the illustration below. I have emboldened each odd block location to highlight individual blocks of values.

An attribute of the data values in L at the end of block sorting is that no data value is more than k locations away from their final sorted location. Where k is the block size. A block is chosen as the buffer and is used for temporary data output during the local block merge operation.

Block sorting two presorted lists

THE INTERNAL BUFFER

We use the last block in block sorted L as a temporary output location during the merge operation. We call this block "the buffer". Selected merged values during the local block merge operation goes into the buffer. Therefore, values at locations in the buffer are displaced during the merge operation. We rotate the buffer to the head of the list before the merge operation begins.

The locations occupy by the buffer do not have to be fixed throughout the whole merge operation. In addition, the buffer does not have to remain as a single block during the merge. Our new algorithm given herein illustrates this.

PROBLEM WITH KRONROD'S APPROACH

* Instability
* Impractical
* Correctness

Some terms used hereafter are not defined in this presentation. However, you can download a full presentation in PDF format - of topics in this presentation by following the link given below. In addition, some terms will become clear as you patiently continue to read or reread this presentation or related subject. If you are not an expert in this area - even in some case if you are - do not try to have full understanding of all the details at one go! And by some means, some details may even be trivial when measured against your real motivation for understanding the content herein.Did you know that, as with real life situations there are "shells" of knowledge and understanding on well presented subjects? Please do not let your prejudice or emotion get into the way.

In Kronrod's algorithm Block Rearrangement is followed by Local Block Merge. During block rearrangement, a given location in each block is chosen as the mark. Blocks are sorted by comparing their mark. Each block is selected into its sorted location base on the outcome of these comparisons. An example of block rearrangement is given in the following illustration. I am aware that this illustration may not be clear to you unless you are familiar with the terms used in this presentation. Never the less, skip what you do not understand and bear it in mind that you may very well have understood it but did not realise this until the next time you had a go at it. I have highlighted some key terms for you in italic.

However, the basic ideas of the internal merge technique presented by MA Kronrod results in unstable merge and increase comparisons and data moves mainly due to block rearrangement. Hence, the guaranteed lower limits of m+n comparisons and m+n-1 data moves no longer holds. In addition, some algorithms base on Kronrod's method are not stable and in some instances will not perform a proper merge. We illustrate instances for selected algorithms that actually do not produce a merge. Hence we highlight the importance of stability during block sorting to the merge actually taking place. Huang & Langston give a modified version of Kronrod's algorithm. This algorithm is unstable and does no more than 1.5(n+m)+O(\sqrt{n+m}lg(n+m)) comparisons and 6(n+m)+O(\sqrt{n+m}lg(n+m)) data moves to merge two pre-sorted list of m and n data values each. The average number of data swaps is reduced by moving the buffer across series of blocks that are sorted by their last element in non-decreasing order. The need to do a block swap at the start of each local merge is eliminated by locating natural runs in the block sorted sequence and then merging with a second series of size k. Where k is the number of element in each block. The block size is defined as k=\sqrt(n+m) or k=c(n+m)/lg(n+m) for some constant c >= 1.

As with Kronrod's algorithm, Huang & Langston’s algorithm will not merge under similar circumstance. For example, if at the end of the block sorting phase we have the following series of values (8,8,8), (4,5,8), (1,2,9), the algorithm will not correctly merge these values during the local block merge. We use single digit whole numbers to represent each value. Blocks of values are delimited by open and close brackets. The last value in each block is taken as the mark. Huang & Langston went on to devise a stable version of their algorithm. However, this results in a more complex algorithm with significant increase run time making it less practical.

Mannila's & Ukkonen's algorithm is easily modified so that it achieves the lower bound on the number of comparisons for merging with |A|= m < |B|=n. The basic idea of the algorithm is a divide and conquer technique originally devised for parallel systems. Basically, the algorithm merge by dividing A into v = O(\root(m)) blocks. The B list is also divided into v blocks, where the jth B block is located by using the last value in the jth block in A to do a binary search to locate a value in B that is greater than this value but whose previous adjacent value is less than this value. Where 0 <= j <= n -1. The jth A and B blocks are then moved to the locations they will finally occupy at the end of merging. The two blocks are merged using an internal buffer of size O(v). Unfortunately, the algorithm is unstable and actually will not perform a proper merge under similar circumstance to that given above. For example, if the two A blocks (5,5,5,5) and (5,5,6,8) are selected and then merge with the sequence of B values (0,1,2,3,4,5,9.9), a possible outcome from the pair selection operation could be as illustrated below. Note that the second B block is empty in this case and the remaining values in the B list are 9,9.

Merge blocks {(5,5,6,8),(0,1,2,3,4,5)} and then {(5,5,5,5), ( ) . . . (9,9)}.

The values are not sorted at the end of the merge operation. The same problem occurs if we select A blocks by their last element as we could have the two sequences of A blocks (1,2,4,5) and (5,5,5,5).

Click the NEXT button for Information about a Better In-Place Technique with Optimum Comparisons and O(N\log2(N)) Data Moves.

Page 3 of 7