To parse an input file containing text and output the counts for the occurences of words in the file by assigning a unique id to each unique word. Also encode the input file using the unique ids for the words.
The files for the lab can be downloaded from here. The files can be extracted by running :
tar -xvf lab07.tar
The input file contains words separated by spaces and/or newlines. The words will only consist of alphanumeric characters. You can assume that there are not more than 1000 unique words in the input file. The input file name is passed as the first command line argument. Please execute the reference implementation (translate.org) to see what you need to do when the file name is not given or the file name given cannot be opened.
For each word in the input file, a unique id is assigned in a monotonically increasing order, with the first word in the input file getting a id of "1".
Note : Repeating occurences of the same word do not get a new id, a new id is assigned to a word only on its first occurence (Look at the example below to better understand the assignment of ids)
The program needs to output two things :
Consider an input file "input.txt" which contains the following text. (Note : The input file contains only alphanumeric characters and the words are separated by spaces or newlines)
CS240 is interesting CS240 is the C programming
course C programming is interesting
$ ./translate input.txt
<1> CS240 2
<2> is 3
<3> interesting 2
<4> the 1
<5> C 2
<6> programming 2
<7> course 1
<1><2><3><1><2><4><5><6><7><5><6><2><3>
Before you submit make sure to test your implementation on LORE using the Makefile
provided.
Your code must compile using the provided Makefile
and run on LORE for you to earn points for this lab.
Type cd .. in lab07 and change working directory to the parent directory of lab07.
In the parent directory of lab07, type turnin -v -c cs240=XXX -p lab07 lab07 to turnin your work. Replace XXX with your section number.
9:30 am - 11:20 am F | F930 |
11:30 am - 1:20 pm F | F1130 |
1:30 pm - 3:20 pm F | F130 |
3:30 pm - 5:20 pm F | F330 |
9:30 am - 11:20 am R | R930 |
11:30 am - 1:20 pm R | R1130 |
3:30 pm - 5:20 pm R | R330 |
11:30 am - 1:20 pm T | T1130 |
Now, you may use the command, turnin -c cs240=XXX -p lab07 -v to verify your submission.
This lab is due on Monday, April 11 by 11:59 pm