In the previous chapter, the trie was introduced. It is a form of a tree-structured index with very specific characteristics. One of the characteristics that was not mentioned in the previous chapter is that there is no relationship between the parent and the children in the trie. In the context of hashed relative files, those characteristics permit dynamic reorganization of a direct access file, thereby circumventing the need for downtime that is associated with maintenance of static hashed files. With extendible hashing, direct access performance is maintained with a comparatively low cost in overhead.
In the chapter on sorting, the heap was introduced. It too is a tree structure and has the characteristic that any parent node has a smaller key that either of its children. This makes it valuable for sorting. It is also a balanced tree in that no leaf node is more than one level different from all other leaf nodes. Next we will explore the binary search tree and the concept of an index based on the binary search tree.
Early attempts at direct access efficiency explored many types of tree-structured indexes. Most of those attempts did not achieve the performance that is available with extendible hashing, but these lines of research did produce the concept that underlies the indexed sequential access systems used today. This research will be traced in the following discussion.
The problem with the binary tree is that if loaded from random inputs, the growth of the tree is unpredictable and long unbalanced chains are possible. If the first value placed on the tree is not near the middle of the range of values to be entered, unbalance will occur -- sometimes a very serious unbalance. Suppose that there is a set of seven keys, B, G, J, N, R, V, and Y. If those keys are inserted into a binary tree in the order listed, the result will be a long chain of seven nodes as shown below on the left. If the same keys arrive in a randomized order, say N, G, V, B, Y, J, and R, the resulting tree, shown on the right, is quite different!
The problems of balancing the tree and maintaining that balance apparently make the binary tree an unattractive approach. But early researchers were not willing to let go of the binary tree as a potential solution. One promising outcome of the early work was the invention of the AVL tree.
In 1962, two Soviet mathematicians, Adelson-Velski and Landis, published a description of a height-balanced binary tree structure and the operations required to maintain the balance of the tree regardless of the order in which keys are inserted in the tree. The result of the prescribed operations is to maintain the tree so that at any node, 1 is the maximum difference between the longest path to a leaf node and the shortest path to a leaf node. They began their discussion by describing the two possible situations that may arise (four if the mirror situations are counted) upon insertion of a key into the height-balanced tree.
| NODE | LEFT SUBTREE |
RIGHT SUBTREE |
DIFFERENCE | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H | 0 | 0 | 0 | ||||||||||||||||||||||||||||||||||||
| J | 1 | 0 | -1 | ||||||||||||||||||||||||||||||||||||
| E | 0 | 2 | +2 | ||||||||||||||||||||||||||||||||||||
| C | 1 | 3 | +2 | ||||||||||||||||||||||||||||||||||||
| A | 0 | 0 | 0 |
The AVL evaluations and rotations are built on the assumption that prior to the current change the tree was in balance. It can be seen that without the node H, the tree meets the AVL requirement that the maximum difference between the left subtree and the right subtree be 1. With the node H inserted, the tree becomes out of balance. The numbers beside each node indicate the difference in the lengths of the subtrees. The numbers are positive and negative to reflect the algebraic result of subtracting the length of the left subtree from the length of the right subtree. In other words, a positive number indicates that the right subtree is deeper than the left subtree; a negative number indicates that the left subtree is deeper than the right.
Because the unbalance occurs in the right subtree (indicated by the +2 at node E), a rotation to the left is indicated. This means that a node and key from the right subtree will be moved up the tree and the current node and key will be move down the left subtree. It is therefore expedient to first create a unbalance to the right at node J. To do this, it is necessary to rotate the subtree to the right. Such a rotation produces the tree below.
With H replacing J at the node and J moving to become the right child, the unbalance is on the right. It must be pointed out that the unbalance of +1 is not an out-of-balance condition that must be corrected; the unbalance at node E is an out-of-balance condition that must be corrected. It is corrected with a rotation to the left. That is, H replaces E, E becomes the left child of H, and J remains the right child of H.
The example above is applied to the right subtree because the unbalance was on the right. Should the unbalance occur on the left, the same procedures would be employed except the directions of the rotations would be reversed.
The second, more complex, rotation was also elaborated.
Consider the tree above into which node F had just been inserted. The new node causes an unbalance at the root (node C). Because the unbalance at the root is positive, preliminary rotations will be necessary to make the unbalance at node H a positive unbalance. And before the unbalance at node H can be changed, the unbalance at node D must be made consistent (made negative) with the unbalance at node H. The tree below shows the result of this first preliminary rotation.
Next, the second preliminary rotation takes place around node H. Remember the goal of this is not to correct the unbalance but to make the unbalance take on a certain form; that is, the unbalance must be consistent in sign with the node above in the tree. The tree below shows the result of this second rotation with the signs on the right subtree all positive.
The final rotation is to correct the original excessive unbalance at the root node. In this, node F is promoted to the root node and node C shifts down to the left subtree. The tricky bit is that node D must be transferred to become the right child of node C.
This resultant is in balance. The only node that has an unbalance is the node H that is missing a left child, but the unbalance is within limits (-1, 0, +1) for an AVL tree. The mirror of this logic is applied when the unbalance is in the left subtree rather than the right subtree.
Although the AVL tree solved the problem of unbalances in the binary tree, it did not solve the problem caused by seeking for each node individually. If each node is stored separately, the average number of seeks in a search is the same as the number of comparisons in a binary search (9.5 in a 1000 record search). This performance is still unacceptable. The concept of blocking was tried because it produced dramatic improvements in earlier situations involving unacceptable amounts of seeking.
In this problem context, blocking became the "paged" binary tree. A page is a physical record in which several nodes are stored. That is, a parent and several levels of descendants would be stored on a page. Descendants from the lowest level on a given page would be identified with pointers to pages containing those descendents. By storing several connected nodes in one physical record, multiple nodes may be retrieved with a single seek and overall search performance is greatly improved.
If all of the keys were available at the time the tree was first built, the structure would require no maintenance. On the other hand, with a dynamic file in which the keys are presented in random order, the maintenance problem is significant. Rotations that are confined to a single page are not a problem, but rotations that cross page boundaries pose big problems, especially when entire subtrees must be transferred from one page to another. In these cases, changes may be needed in not only the adjacent page, but also in all of the descendant pages!
Paged binary trees solved one problem but reintroduced another. The history of file structures is filled with similar cases.
In the late 1960s, the problem of maintaining a large index (too large to be held all in main memory) remained. Research at Boeing Computer Services by Bayer and McCreight (1972) produced the concept of B-trees. This tree structure allows more than one key per node. Whereas the binary tree allowed only a single key per node along with pointers to decendents, the B-tree allows a large number of keys per node along with the pointers to the decendents of each key. Rather than have a tree structure (or more accurately, a part of a tree structure) within the page as with the paged binary tree, the B-tree has one (very large) node per page. In addition, the B-tree maintains the index data within the node as ordered lists to facilitate searching.
The structure of a B-tree node is rather complex. The data in the node consists of keys and pointers. The key is a data key and the pointer is the address on disk of another index node (a child) or of the associated data record or records in the case of a leaf node. The manipulation of the pointer is straightforward and uninteresting. It is the manipulation of the keys that forms the power of the B-tree. In addition to the keys and the pointers, other data, such as the number of keys at the node, will be stored. Below is a conceptual representation of the index node expressed as C structures.
struct k_data
{
char key[8];
int left_child_pointer; /* byte offset into the index file (use with fseek()
to find the child structure on disk) */
};
typedef struct
{
int number_of_keys;
/* other data as needed */
struct k_data key_data[1000];
int right_child_pointer;
} I_NODE;
At any level of the index, the rightmost key is a separator between a left child that has lesser valued keys and a right child that has keys of greater or equal value. The structure defined above accommodates that situation with a single right child pointer. For any other key at a given level, the left child of the next key to the right is its right child.
Given a structure as defined above and populated with key values BAKER, GORMAN, LESTER, NORMAN, QUIGLEY, and THOMAS, how does one find MURPHY? That makes the number_of_keys for this node 6. The search mechanism is relatively simple. After moving the value MURPHY into a target_field, the logic proceeds as follows.
+- search routine
|
| +== for(i=0, next_seek=0; i<node_name.number_of_keys; i++)
| |
| | result=strcmp(target_field, node_name.key_data[i].key)
| | if(result<0) break
| +== end for
|
| +-- if(i==node_name.number_of_keys)
| | next_seek=node_name.right_child_pointer
| +-- else
| | next_seek=node_name.key_data[i].left_child_pointer
| +-- end if
|
| fseek(fp,next_seek,SEEK_SET)
| fread(&node_name,sizeof(I_NODE), 1, fp)
|
+- end search
All that is needed a method to determine when the pointer to the child is actually a pointer to a data node. One way would be to have a flag in the index node that would be set for leaf nodes. Prior to seeking, the process would check to see if the current index node were a leaf node; if so, then the next seek would be to a data node and the fread() would reference the structure appropriate to a data node. Another way would be to maintain a count of the number of index levels in the root node and loop through the process that number of times minus 1 (for the root which is already in memory). In either event, the routine above would be nested in another loop that would be exited prior to the read of the data node.
The nodes of the B-tree have a number of characteristics that make up the definition of the B-tree.
The student should be aware that the definition of order used above differs from that used by some other authors. Other definitions of order have been based on maximum number of keys and on maximum number of children. Folk and Zoellick (1972, p.363) argue that order based on the minimum number of keys poses problems when an odd maximum number of keys is implemented, but they offer no rationale for having an odd maximum and ignore the fact that using Bayer and McCreight's definition does not allow an odd maximum and therefore no problem should arise. The definition used herein conforms to that used by Bayer and McCreight (1972). This definition has the advantage that the maximum number of keys is always an even value. Therefore, when a new key is to be inserted and the node will be overfilled, selecting the middle key for promotion is easy and straightforward.
Rather than extend the tree from the top down as is done with binary trees, the B-tree grows from the bottom up by splitting nodes and promoting keys to superior nodes in the tree structure. In the beginning, there is only a single node; it is both root and a leaf node. Consider the following situation involving a node with a capacity order of 1. With the addition of key L, the node must be split. The data is moved to a work area. The original node becomes the left child, and the left portion of the keys (with other associated values) are moved there. A new node is created to become the right child, and the right portion of the keys are moved there. Because the split occurred at the root node, a new root is created. The middle key is promoted to the new root, and appropriate pointers to the left and right children are moved into place.
With the arrival of key H, the right child is filled as shown on the left below. When another key P is to be inserted, another split must be performed. After splitting and promotion, the tree is as shown on the right.
To understand the growth of the tree, consider what happens when key D is inserted on the left below and is followed by key C. For the key C to be added, the leftmost node must be split; the split and promotion results in three keys at the root. The root must now be split using the same process as for the leaf nodes. When accomplished, a new root is generated and key F is promoted to the root node.
The need for a deletion arises from the concatenation of the left and right child of a key. First, the key is located. Hereinafter, the node containing the key to be deleted will be called the target node. If the deletion will not cause the target node to fall below the minimum size d, then the removal of the key is straightforward, and the pointer to the right child is set to null (this presumes that when concatenation of children occurs, the results are stored in the left child).
Consider the B-tree above. Deletion of key H results in an "underfill" at that node, the target node. The target node must be concatenated with its sibling (a node with the same parent), in this case the node with key P. In reverse of the split and promote logic for insertion, deletion involves demotion of the parent and concatenation. In this case, key L would be demoted and placed in the work area along with the key(s) from the sibling node. The results are place in the target node and the pointer to the now empty sibling is eliminated. The parent node is then reorganized. The result of these operations is shown below.
The deletion of key X from the tree above poses no problem; however, a successive deletion of key U raises an interesting situation. When key S is demoted to the target node, concatenation of the target node with its left sibling produces an "overfill" which must then be handled as a split and promote, as in an insertion. This results in key P being promoted to the parent node with key L in the left child and key S in the right child. The result of all of these operations is shown below.
Another way to look at the result of the above operations is that a rotation occurred to redistribute the keys to delay more extensive operations on the tree. Before any deletion occurs, the need for such a rotation can be predicted by first checking to see if the number of keys at the target node is greater than d; if so, deletion can occur without further concern; if not, then sum the number of keys at the target node with the number of keys in its left sibling (right sibling if the target node is the leftmost node in its subtree). If the sum is greater than 2d, redistribution to the target node should occur before deletion; if not, then deletion and concatenation must occur.
Deletion of key D from the tree above will demonstrate the processes when concatenation produces an "underfill" at an intermetiate node in the tree. First, after all appropriate checks, key C is demoted to the target node, and the target node is concatenated with its left sibling. The effect of demoting key C is to make its former position a target node, to delete the key, and to create an "underfill." The former parent of key C, key F, after appropriate checks, is demoted to its child, thereby creating an "underfill" at the root. The target node is concatenated with its sibling, and the result is made the new root.
The rotations and concatenations are predictable and programmable. Although the examples above use a d of 1, in practice d is large, with the value being determined by the size of the physical storage unit. The result is that the tree structure becomes very bushy, wide, and shallow, which is the appeal of this type of structure for index systems.
In the B-tree, splitting is done on the basis of one node into two. This occurs when one node "overfills." What would happen if splitting were delayed by using redistribution? Before splitting, check the right sibling to see if there is space there; if space exists, then demote the parent key into the right sibling and promote the largest key in the target node to the parent position. By employing this redistribution-on-insertion strategy, we achieve a better utilization of disk space. This higher "fill rate" would also tend to minimize the need for concatenation of nodes on deletion.
Consider the situation in which both the target node and its right sibling are filled. In this case, the strategy is to employ two into three splitting. This is a more extensive reorganization, with two-thirds of the total keys going into each of the three nodes (two old nodes plus one new node). Because of this reorganization, two new separators must be selected, one for the two sets of keys to be deployed into the two old nodes and one for the two sets of keys deployed into the old right node and the new node. These must be inserted, along with appropriate pointers, into the parent node.
Consider the situation shown below. The tree has two nodes, both full. What happens if the key D needs to be inserted into the tree? First, it is determined that D belongs in the left node of the tree; further, it is determined that the left node is full. Time for redistribution, and therefore the right sibling is evaluated. The right sibling is also full; time for 2 to 3 splitting. The operations could be coded to take place in the existing structures with a new node added; however, it is easier to understand the operations with a work area to hold the data while it is prepared for the reorganization. The student should carefully follow the movement of the data.
The figure below shows the result of the preparing for the split. The keys of the two nodes are moved to a work area along with the parent I. The pointers in the leaf nodes are also copied, with the right child of H becoming the left child of I. Then the key D, with its associate pointer are inserted into the work area at the correct place (recall the operation of an insertion sort). At this point, all of the keys and all of the pointers involved in the reorganization are in the work area. For practical purposes, the two leaf nodes and the one position in the root are empty.
Now, the new node is created and its place on disk is identified (probably at end of file). With this information it is now possible to complete the reorganization. The first 2d/3 positions are moved to the old target node. The next key is promoted to the root as a separator (the pointer to the left child does not need to be changed). The next 2d/3 positions are moved to the old right sibling, and the next key is promoted to the root as a separator (its left child is the old right sibling) and the pointer to the new node becomes the its right child. Finally, the last 2d/3 positions are moved to the new node which is logically right of both of the old nodes.
Two points need to be made about the size of the structures needed in the above operations. To support the 2 to 3 splitting and the 66% minimum fill criteria, the order of the index node should be in integer multiple of 3, an example of which is shown in the graphics above. The size of the work area is 4d + 2, 4d to hold the contents of the two nodes, one position for the demoted parent, and one position to hold the new key which started all of this in the first place.
On deletions, the reverse rules apply. Redistributions are made between the target node and its left and right siblings. The minimum number of keys (in the terms defined for the B-tree) is 4d/3. When both the left and right siblings of the target node are at the miminum number of keys, three-into-two concatenation occurs. This reorganization, like the splitting above, involves selecting a new separator for the two nodes, placing that key and the appropriate pointers into the parent node in place of one of the old separators, and removal of the other separator from the parent. Like the operations above, this is best visualized with the use of a work area.
The following characteristics, then, define the B* tree. First, the miminum "fill rate" of any node except the root is 66%. Second, redistribution is employed on additions as well as deletions. Third, splitting is done on a two-into-three basis (concatenation is three-into-two). The gain with the B* tree is more efficient space utilization and a smaller index tree (fewer nodes and perhaps a shallower tree) overall.
Consider the effect of adding an extra field to the leaf nodes that provides a pointer to the right sibling node. Such an arrangement would allow access to the next index in key sequence without traversing the index tree, thereby providing reasonably efficient sequential access. The cost of this would be some extra logic when leaf nodes are split or concatenated to insure the pointers are maintained correctly, small enough a price to be able to process sequentially by primary key.
The index node to support this system is presented below. Note the addition of the right_sibling_pointer.
struct k_data
{
char key[6];
int left_child_pointer;
};
struct i_node
{
int number_of_keys;
int right_sibling_pointer;
/* other data data as needed */
struct k_data key_data[1000];
int right_child_pointer;
};
Now the entire file may be processed and only the leaf nodes in the index tree need be accessed after the leftmost leaf node in the tree is accessed. It is also possible to begin sequential access with any key in the file. All that is needed is to perform direct access of the starting key and then shift the processing logic to that used for sequential access; all of the information for such processing is available in the leaf node that was used for direct access. In this way, the B+ tree provides efficient direct access and the capability for flexible sequential access.
With respect to the desirable qualities of an index, the B-tree overcomes the limitations of other structures. The usefulness of the structure is attested to by the fact that it became the de facto standard for indexed file systems within a decade.
It is not suggested that the student program an indexed-sequential access method (ISAM) system. Rather, the student should understand its operation and what is logically required to operate such a system. The student should be aware that an ISAM system is built in to many operating systems and that another source for such capability would be a third-party package, like Btrieve or Informix C-ISAM. It is the goal of this chapter to give the student enough understanding that sound decisions can be made regarding the use of such packages.