Chapter 9 - Extendible Hashing

Background

With all of the discussion of blocks, buckets, indexes, pointers, sorting and searching, the problem of the dynamic file remains. Without periodic reorganization, static hashing will produce unacceptable performance as the file grows. During the reorganization, the file must be temporarily stored as a sequential file, the hashing algorithms and the implementation programs changed to reflect the new file size, the new file area initialized, and the data then processed through the new hashing algorithm to populate the new area. And naturally, the file will be unavailable during the maintenance period. If the business application requires access seven days a week 24 hours a day, this situation is unacceptable. Is there any way to organize this file so that it can grow and not require maintenance or downtime?

Perhaps the index and the bucket can help out again.

An Index

Consider the result of hashing, a binary number. What if in the beginning, all of the records whose key hashed to an odd number were placed in one bucket and all of the records whose keys hashed to an even number were placed in another bucket? All that would be required would be the final (lowest order) binary digit from the key. A 1 would be an odd number; a 0 would be an even number. A simple index can then be set up using the last digit as a subscript with the entry in the index being the relative record number of the bucket containing all of the records with the appropriate (odd or even) hash result.

On the left the index is represented as a special form of a tree structure called a trie (rhymes with sky). This is a kind of binary tree and one of its characteristics is that it is always complete. On the right is the corresponding representation of the leaves of the tree in table form. In this image, there are pointers (relative record numbers) to two buckets. These buckets are sized to match a natural storage unit (perhaps a cluster or even a track). For the purposes of the example, consider the bucket size to be 3 records.

Given the above view of our index, what happens when one of the buckets fills up? Clearly, the bucket must be split. But how should it be split? Using a single binary digit, the information available has been exhausted. There is no choice, the next binary digit must be used to provide the basis for the split. The tricky part is that the low-order digit will not remain the low-order digit; rather, the former low-order digit will be promoted one position (shifted left) and the former second digit in the hashing result will become the first (low-order) digit in the subscript. The effect of this is to preserve the odd-even split that was begun above. This may be clearer with an example; consider the binary numbers below. The low-order digits are underlined.


	000
	010         these go in the even bucket (call it the A bucket)
	100
	110

	001
	011         these go in the odd bucket (call it the B bucket)
	101

If the even bucket must be split, the initial separation based on odd or even can be preserved only if the 0 becomes the most significant digit in the new subscript. To do otherwise will cause the entire file to be reclassified, which is not a satisfactory result. Because a second binary digit is being used, the number of leaves on the trie and the number of positions in the table must double in order to preserve the completeness of the tree. The table, which formerly had two positions, will now have four postions. The image below shows the expansion of the trie.

This image shows the new bucket, C, that will be created to support the split. The other buckets are retained. Nothing happens to the B bucket; therefore, the pointer is duplicated in the two positions in the table that correspond to a 1 in the most significant digit position. The following list shows the new bucket assignments of the records that were previously assigned to the A bucket.


    000      this will stay in bucket A  (formerly the even bucket)
    010      this will shift to bucket C
    110      this will shift to bucket C
    100      this will stay in bucket A

A little more explanation may be necessary here. The underscores will be maintained to preserve the identity of those digits that were initially in the low-order position. Take the first entry, 000. When two digits must be used to determine the subscript, the low-order 0 is taken out and placed in the subscript and the remainder of the entry is right shifted producing a 00; the subscript is then left shifted; next the low-order 0 from the entry is taken out and placed in the low-order position in the subscript and the process terminates. The resulting subscript is a binary 00, which is still 0 decimal, and would be used in a table lookup that would produce the address of the A bucket. Next, look at the second entry, 010. Applying the same process places the 0 in the subscript and right shifts the entry to produce 01; the subscript is shifted left, and the low-order 1 is placed into the low-order position in the subscript. The result is binary 01, which is decimal 1, and would be used in a table lookup that would produce the address of the C bucket.

Without the shifting and manipulation, the second entry would have produced a two-digit subscript of 10, a decimal 2, which would result in a B bucket address and a complete reorganization of everything! Definitely not the desired result!!

Records may continue to be added to each bucket until one fills up. If that is bucket B, no expansion of the index will be needed. A new bucket will be created, its relative record number will be placed in the table, and the appropriate records will be shifted to the new bucket. The algorithm may be written to leave the records that produce the smaller subscript in place and shift those that produce the larger subscript, or vice versa. The key for each record in the bucket must be rehashed using the increased number of significant digits and shifted or not according to the algorithm being used. During these operations, both buckets (the old and the new must be in memory at the same time). After the split is complete, each bucket must be rewritten to its correct position in the file. The index must also be rewritten to its file.

Implementation

The bucket used for extendible hashing is not greatly different from the bucket used to minimize collision problems with static hashing, but usually it will be much larger. The following structure description is shows the elements that would be needed in the file of data buckets.


      struct bucket_tag
	  {
	    int number_of_records;
	    int bucket_bits;
	    struct rec_tag record[bucket_size];
	  }

The number_of_records field is updated every time a record is added. It is used to determine when the bucket is full and must be split, and it is used to limit the search for the key being retrieved. The bucket size is set at implementation based on consideration of number of records and constraints on the size of the index table.

The index table may be implemented as a file with a header record and multiple bucket addresses, each treated as a single record. The descriptions below would be used in such an implementation.


struct
{
  int bits_used_in_index;
  long buckets_used;
} header_record;
	  
long bucket_address;

long bucket_table[MAX_BUCKETS];

The index file will be truly variable length and should be accessed sequentially. The maximum number of buckets depends on the maximum number of records in the file and the bucket size. Some methods of estimating this value will be discussed below. The number of bits used in the index is used by the makesub function to determine the proper value to return (it is shown in the code below as the argument depth). When the trie must be expanded, the number of bits will be incremented by one and the number of table positions used will double.

The hashing function used in extendible hashing may be the fold-and-add algorithm used in the earlier chapter, or any other algorithm that produces an acceptable range of numeric values from the data key. An important consideration in the details of the algorithm is the size of the key. An integer constrained to two bytes can only produce addresses to 32768 array subscripts. This value multiplied by the bucket size gives the upper limit on the number of records that can be stored. It is important to remember that not all of the index entries will necessarily point to a separate data bucket; several index entries may point to the same data bucket. In addition, not all of the data buckets will necessarily be full. Therefore, the number of index entries will always be greater than the quotient of the number of records divided by the bucket size! If more storage space is needed and the bucket size already fills a track, then a larger return from the hashing algorithm is in order. To do this, replace the 19937 divisor in the fold-and-add algorithm with a larger prime number that will allow the sum to become a number using more bits. For example, a 24-bit integer can represent a value as large as 16,777,215, and the maximum number of index entries possible is 16,777,216. The selection is a matter for the software designer to consider.

What about the algorithm that produces the subscript from the numeric value? A C implementation follows.

     

       /*  CALLS hasher() AND USES THE RESULT TO FORM A SUBSCRIPT
		           TO BE USED IN EXTENDIBLE HASHING  */

     int makesub(char *in, int depth)
	 {
        char key[21];
        int i;
        int hash, sub = 0;
        int remainder;
		strncpy(key, in, 20);
		key[20] = ' ';
        hash = hasher(key)
		for(i=0; i < depth; i++)
		{
		  sub = sub << 1;
          remainder = hash%2;
          sub = sub + remainder;
       /*    sets up hash to look at the next lowest bit 
             during the next iteration  */
          hash = hash >> 1;
        }
       return sub;
	 }

For either insertion or retrieval, the algorithm above must be executed. The return value is used to find the relative record number of the bucket and that bucket is read. A sequential search of the bucket would be made to locate the key for either retrieval or insertion. For retrieval, the record corresponding to the key field would be displayed or otherwise used. For insertion, the existence of the same key would be an error condition (assuming we are working with unique keys). The number of records must be checked to see if there is space available, and, if so, the record would be placed into the next available position in the bucket, otherwise the bucket must be split and the other reorganization steps completed before the record is placed in its appropriate bucket.

The routine above and those below require a context to be understood. The following code is an implementation of the add-a-record program. It contains all of the logic to identify the need to create buckets and to expand the index.


#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>

#define MAX_BUCKETS 512          /* the index size and the maximum number of buckets */
#define SIZE 10                  /* the number of records per bucket                 */

struct rec_tag
{
  char key_field[21];
  /* other data fields as necessary */
};

typedef struct buck_tag
{
  long number_of_records;
  int bucket_bits;
  struct rec_tag record[SIZE];
} BUCKET;

struct head_tag
{
  int bits_used_in_index;
  long buckets_used;
};

int main()
{
  int  depth, bits;
  int  i , limit;
  int  sub;
  int  record_found;
  long rrn;
  long address;
  int begin_new, end_new;
  long bucket_address;
  long bucket_table[MAX_BUCKETS];   /* this will hold actual byte offsets rather than rrn */
  long index_size;
  long positions_used;
  
  BUCKET bucket;
  BUCKET new_bucket;
  struct rec_tag record;
  struct head_tag header_record;

  FILE *buckets;
  FILE *index;

  index = fopen("c:\\index.dat", "rb+");  /* access will only be sequential */
  buckets = fopen("c:\\buckets.dat", "rb+");  /* access will be relative     */

  fread(&header_record, sizeof(header_record), 1, index);
  index_size = pow(2, header_record.bits_used_in_index);
  fread(bucket_table, sizeof(long), index_size, index);

  /*  at this point would be a call to a function that would fill the 
      structure called record with valid data                               */
          
  sub = makesub(record.key_field, header_record.bits_used_in_index);

  fseek(buckets, bucket_table[sub], SEEK_SET);
  fread(&bucket, sizeof(bucket), 1, buckets);

  for(i=0, record_found=1; i < bucket.number_of_records && record_found != 0; i++)
    record_found = strcmp(record.key_field, bucket.record[i].key_field);

  if(record_found == 0)
  {
    puts("Record already exists.");
    puts("Enter another record.");
  }
  else
  {
    if(bucket.number_of_records < SIZE)
    {
      bucket.number_of_records += 1;
      memcpy(&bucket.record[bucket.number_of_records - 1], &record, sizeof(record));
      fseek(buckets, bucket_table[sub], SEEK_SET);
      fwrite(&bucket, sizeof(bucket), 1, buckets);
    }
    else /* the bucket is full and must be split  */
    {
      if(bucket.bucket_bits == header_record.bits_used_in_index) 
          header_record.bits_used_in_index = 
                                doublidx(header_record.bits_used_in_index, bucket_table);
      moverecs(&bucket, &new_bucket);
      header_record.buckets_used += 1;
      fseek(buckets, bucket_table[sub], SEEK_SET);
      fwrite(&bucket, sizeof(bucket), 1, buckets);
      fseek(buckets, 0, SEEK_END);
      bucket_address = ftell(buckets);
      fwrite(&new_bucket, sizeof(bucket), 1, buckets);
      
     /* now comes the tricky bit.  the address of the new bucket must be
        inserted into the appropriate place(s) in the index.  first, get the
        subscript for the table entry from any key in the new bucket.       */

      sub = makesub(new_bucket.record[0].key_field, header_record.bits_used_in_index);
	  
	  rrn = bucket_table[sub];
	  limit = pow(2,header_record.bits_used_in_index);
	  
      for(i=0; i < limit; i++)
      {
        if(bucket_table[i]==rrn) break;
	  }
	  begin_new = i;
	  for(;bucket_table[i]==rrn && i < limit; i++) end_new = i;
      begin_new = begin_new + (end_new - begin_new + 1) / 2;
     
      for(i=begin_new; i <= end_new; i++) bucket_table[i] = bucket_address;

      fseek(index, 0L, SEEK_SET);
      fwrite(&header_record, sizeof(header_record), 1, index);
      index_size = sizeof(long) * pow(2, header_record.bits_used_in_index);
      fwrite(bucket_table, index_size, 1, index);
    
     /* now that the index and bucket maintenance is done, the rest is simple
        retrieve the bucket and add the new record.                           */

      sub = makesub(record.key_field, header_record.bits_used_in_index);

      fseek(buckets, bucket_table[sub], SEEK_SET);
      fread(&bucket, sizeof(bucket), 1, buckets);

      bucket.number_of_records += 1;
      memcpy(&bucket.record[bucket.number_of_records - 1], &record, sizeof(record));
      fseek(buckets, bucket_table[sub], SEEK_SET);
      fwrite(&bucket, sizeof(bucket), 1, buckets);
    }  /*  end else */
  }  /*  end else */

  fclose(index);
  fclose(buckets);

  return 0;
}

Of consideration to the programmer is the nature of the routine that will expand the address space if necessary. When attempting to insert a new record into the file, the need for a split of a bucket is identified when the number_of_records field is equal to the bucket SIZE. The need to expand the index is indicated when the number of bits used in the bucket that is to be split is equal to the number of bits used in the index.


int doublidx(int bits, long *addresses)
{
  int i;
  int size;
  long old_table[MAX_BUCKETS];

  size = pow(2, bits);

  if((size * 2) == MAX_BUCKETS) puts("WARNING: You are about to run out of index space!");

  memcpy(old_table, addresses, (size * sizeof(long)));

  for(i=0; i < size; i++)
  {
    addresses[i*2] = old_table[i];
    addresses[i*2 +1] = old_table[i];
  }

  return (bits + 1); 
}

With the index expanded to handle the new bucket, the next problem is to populate the new bucket with the appropriate records from the old bucket. The following code implements a function that will distribute those records.


int moverecs(BUCKET * old, BUCKET * new)
{
  char spaces[21] = "                    "; 
  int i, j, sub, n_sub;
  int odd, flag, space, check;
  old->bucket_bits += 1;
  n_sub = 0;
  for(i=0; i < SIZE; i++)
  {
    sub = makesub(old->record[i].key_field, old->bucket_bits);
    odd = sub%2;
    if(odd)
    {
      memcpy(&new->record[n_sub], &old->record[i], sizeof(struct rec_tag));
      strcpy(old->record[i].key_field, spaces);
      n_sub++;
    } 
  }
  new->number_of_records = n_sub;
  new->bucket_bits = old->bucket_bits;
 /*  at this point the new bucket is properly populated  */

  for(i=0, flag=0; i < SIZE && flag==0; i++)
  {
    space = strcmp(old->record[i].key_field, spaces);
    if(space == 0)
    {
      for(j=i+1, check=1; j < SIZE && check != 0; j++)
          check = strcmp(old->record[j].key_field, spaces);
      if(j < SIZE) 
      {
	memcpy(&old->record[i], &old->record[j-1], sizeof(struct rec_tag));
        strcpy(old->record[j-1].key_field, spaces);
      }     
      else flag = 1;
    }
  }

  old->number_of_records -= n_sub;
  return 0;
}

These routines implement the routine data addition procedures. A separate set of processes to retrieve and update existing records are needed. Also required are routines to initiate the index file and the bucket file. And finally, to accommodate all possibilities, routines should be developed to delete records from the buckets and to shrink the index when required, however unlikely this may be. The reader is directed to Folk and Zoellick (1992, pp. 520-525) for pseudocode for the deletion routines.

Summary

It is possible to support relative access to dynamic files with a single seek (O(1)). This is achieved using an index structure based on the trie, data buckets, and logical processes that form a method called extendible hashing. The appeal of extendible hashing is that it can support 7 by 24 operations by not needing maintenance downtime that static hashing may require.