PSO week 11 materials
Hash Tables
Two components:
Bucket array
Hash function

Bucket array - An array A with size N.  
If all keys are in the range [0,N-1], then the bucket array can be used 
without modifications.  Element in the (key,element) pair can be stored in 
A[key] bucket.
Empty bucket cell should have some indication that it is empty (NO_SUCH_KEY).
Collision - when two different elements are mapped to the same bucket.
Solution - chaining.

Disadvantages - The bucket are fixed-size N even only n of them are used 
(n<=N).  It wastes spaces.
It requires key that is in a range [0,N-1]




Hash functions
We need to define a function f(k) so that the result is fall into the 
range [0,N-1].  
Hash code for key k is an integer assigned to the key k.
If k1 and k2 are equal, f(k1) = f(k2).
Note: f(k1) = f(k2) even if k1 is not equal to k2.
Hash function is a function that computes the hash code
Load factor is the ratio of the number of items to the number of bucket 
cells.  n/N
To reduce the probability of collisions, the load factor should be kept below 1.
It's up to you to define the value of the load factor.  
A common choice is 0.75.
Rehashing is required if the load factor grows above the threshold.
REHASH
1. Allocate memory for another new bucket array that is larger than the 
old bucket array.  
2. For each element in the old bucket array, recomputed a new hash code 
and relocate the element to the new bucket array based on the new hash code.

A simple hash function 
 f(k) = k mod N
N should be a prime number so that the distribution of elements can be 
spread out in the array.  
For example,
{200, 210, 300, 310, ... , 500} N is 100
200 collides with 300 and 500
If N is 101, there will be no collisions.


If the key values are in the form iN+j for different i's, then there will 
still be collisions.
We can improve it by randomly generating two positive integers x, y.
f(k) = (xk+y) mod N  	where x is not equal to 0
It hopefully eliminates repeated patterns.



There are different ways to handle collision besides rehashing.
Linear probing - if f(k) is mapped to bucket A[i] but A[i] is occupied, 
then we try A[(i+1)%N] and so on until we find an empty bucket.

Quadratic probing - If A[i] is occupied, we try A[(i+f(j)) % N] for 
j=1,2,3,... and f(j) = j^2

Double Hashing - We need a second hash function h'.  If A[i] is occupied when 
the first function h(k) = i, we try the buckets A[i + f(j)] for
j=1,2,3,... and f(j) = j*h'(k).  It's up to you to decide f(j).  In the
homework assignment, it is defined for you.



Exercises
R7-11
h(i) = (2i+5) mod 11
Keys are 12, 44, 13, 88, 23, 94, 11, 39, 20, 16, and 5

1. Collisions are handled by chaining
2. Collisions are handled by linear probing.
3. Collisions are handled by quadratic probing.