Additional instructions for Project 4

1. We distingiush between lowercase and uppercase letters for example words
"The" and "the" are different beause of the case of the first letter 't'.The
integer value used for the letters should be the ASCII value of the letter.

2. You have to read the words directly from the files text1 and text2 using
filereader. No command line arguments are provided. The project files are
executed using
        java project4
 

3. The index 0 is the rightmost letter of the word. For example if the word is
"Purdue" then x0 is 'e',x1 is the rightmost 'u',x2 is 'd' and so on.

4.Every punctuation mark separates a word from other.The special case is the
hyphen('-').If two words are connected with the hyphen then that is a single
word. For example "foo-bar" is a single word and NOT two different words
separated by a hyphen. And each word is inserted only once.

5.Since the size of a hashtable should be a prime number, whenever the load
factor increases above 0.95 ,you dont have to just double the the size of
the hashtable. Instead you go as the following. You start with size B = 71,
if you have to increase you increase it to B = 149 then to B= 307 then to B=617
The final size of the hashtable will be one of the above values of B only.

6.You have to calculate the avarage number of comparison for different
load factor ranges. Have five different counts, each one for a particular range.
Whenever you find a collision in the HashTable for text1, check the load factor
at this point and increment the appropriate variable depending on the load factor.

7.  Each time you pass the load factor of 0.95, you have to make a new hash table
with double the number of hash entries (Use the values provided for B ).Each time
you make a new hashtable you  have to rehash the entries that were present in old
hashtable to the new  hashtable. And you have to throw away the previous counts
(the average number of comparisons)and start recomputing as you insert into the
new hashtable.

8.  For counting the distinct words, count all the words in a file except those which
already occured. For example, if the word 'ALICE' appears twice, count it only once.

9. Finally you have to compute the average number of comparisons per word
from the counts for each range. The count divided by the total number of words
inserted in that range gives the average number for that range.

10.This project will be graded manually so you need not worry about extra spaces or
delimiters. However, your format should  look like this:
(Note: This is not a sample solution..)

Common words: Alice Jump She

Distinct words:
text1=10
text2=6

Average Number of Comparisions:
[0,0.5) =2.0
[0.5,0.65)= 3.2
[0.65,0.75)= 6.10
[0.75,0.85) =9.02
[0.85,0.95) =11