The goal of this assignment is to build a functional HTTP server. This assignment will teach you the basics of network programming, client/server structures, and issues in building high performance servers. While the course lectures will focus on the concepts that enable network communication, it is also important to understand the structure of systems that make use of the global Internet. You must work individually on this lab; we will use MOSS to check for plagiarism.
At a high level, a web server listens for connections on a socket (bound to a specific port on a host machine). Clients connect to this socket and use a simple text-based protocol to retrieve files from the server. For example, you might try the following command from a UNIX machine:
% telnet www.cs.purdue.edu 80 GET / HTTP/1.0\n \n(type two carriage returns after the "GET" command). This will return to you (on the command line) the html representing the "front page" of the Purdue computer science web page.
One of the key things to keep in mind in building your web server is that the server is translating relative filenames (such as index.html) to absolute filenames in a local file system. For example, you may decide to keep all the files for your server in student/cs536/server/files/, which we call the root. When your server gets a request for /index.html, it will prepend the root to the specified file and determine if the file exists, and if the proper permissions are set on the file (typically the file has to be world readable). If the file does not exist, a file not found error is returned. If a file is present but the proper permissions are not set, a permission denied error is returned. Otherwise, an HTTP OK message is returned along with the contents of the file.
You should also note that web servers typically translate "GET /" to "GET /index.html". That is, index.html is assumed to be the filename if no explicit filename is present. The default filename can also be overridden and defined to be some other file in most web servers.
When you type a URL into a web browser, it will retrieve the contents of the file. If the file is of type text/html, it will parse the html for embedded links (such as images) and then make separate connections to the web server to retrieve the embedded files. For example, if a web page contains 4 images, a total of five separate connections will be made to the web server to retrieve the html and the four image files. This discussion assumes the HTTP/1.0 protocol which is what you will be supporting first.
Next, add simple HTTP/1.1 support to your web server: support persistent connections and pipelining of client requests. You will need to add a heuristic to your web server to determine when it will close a "persistent" connection. That is, after the results of a single request are returned (e.g., index.html), the server should by default leave the connection open for some period of time, allowing the client to reuse that connection to make subsequent requests. This timeout needs to be configured in the server and ideally should be dynamic based on the number of other active connections the server is currently supporting. That is, if the server is idle, it can afford to leave the connection open for a relatively long period of time. If the server is busy, it may not be able to afford to have an idle connection sitting around (consuming kernel/thread resources) for very long.
For this assignment, you will need to support enough of the HTTP protocol to allow an existing web browser (Firefox, Safari or Konqueror) to connect to your web server and retrieve the contents of a sample page from your server. (Of course, this will require that you copy the appropriate files to your server's document directory).
At a high level, your web server will be structured something like the following:
Forever loop: Listen for connections Accept new connection from incoming client Parse HTTP request Ensure well-formed request (return error otherwise) Determine if target file exists and if permissions are set properly (return error otherwise) Transmit contents of file to connect (by performing reads on the file and writes on the socket) Close the connection
You have three main design choices in how you structure your web server in the context of the above simple structure (note that the event model is required; the others give bonus points):
Finally, support the following commands:
Implement your assignment in either C or C++. You will want to become familiar with the interactions of the following system calls to build your system: socket(), select(), listen(), accept(), connect(). We outline a number of resources below with additional information on these system calls. Several books are also available on this topic.
The format of the command line should be:
myhttpd [<http>] [<port>] [<timeout>]
If <http> is not passed, your web server will run in HTTP/1.0 mode. Otherwise, passing 1 or 1.1 will define the HTTP version your web server will run in. If <port> is not passed, your web server should default to the port number 8000. If <port> is given, your web server should make sure it is larger than 1024 and less than 65536. <timeout> is given as a positive time in seconds. If <timeout> is not passed, your web server should default the timeout used for HTML 1.1 to 300 seconds (5 minutes). If <timeout> is given, your web server should make sure it is greater than 0 seconds.
Do not include the brackets in the command-line parsing. Running your web server should resemble the following:
myhtpd 1.1 65000 20This would run the webserver in HTTP/1.1 mode, listening on port 65000, and have a timeout of 20 seconds.
Now that you have a functional web server, the second part of the assignment involves evaluating the performance of the system that you have built. Build a synthetic client program that connects to your server, retrieves a file in its entirety, and disconnects. The goal of this load generator is to evaluate the performance of your server under various levels of offered load. You will measure server performance in terms of throughput (requests/sec) and in terms of latency (average time to retrieve a file). Your synthetic load generator may be multi-threaded, with a different number of threads (this will be used to generate average throughput).
A principal aspect of this assignment is to compare the performance of your web server using HTTP/1.0 versus HTTP/1.1, especially given the behavior of TCP. There is significant overhead to establishing and tearing down TCP connections (though this is less noticeable in a LAN setting) and persistent connections avoids this issue to some extent.
A number of considerations affect the performance HTTP/1.0 versus 1.1. Consider the pros and cons of using a connection per session versus using a connection per object. The difference between the two comes down to the following:
Consider the effects of the bandwidth delay product, round trip times, and file sizes on this tradeoff. For example, high round trip times exacerbate the negative effects of slow start (taking multiple rounds to send a file even if the bottleneck bandwidth would allow the entire file contents to be sent in a single round trip).
It will be important to run your web server and load generator on different machines. It may be necessary to use multiple machines as load generators to saturate the server. You will want to keep track of CPU load on both the load generator and the server machines to determine when each system component saturates (at least with respect to CPU load).
Profile your server code to get a rough idea of the relative cost of different aspects of server performance. For different sized files, how much time is spent in establishing/tearing down connections versus transmitting files? Use timing calls around important portions of your code to keep track of this time on a per-logical operation basis. What type of bandwidth does the web server deliver as a function of file size? Also try running your test for different size files. How does latency/throughput change with file size? You may find it useful to use a scripting language, such as PERL, to control the performance evaluation.
You will find the following paper helpful:
Barford, Paul, and Crovella, Mark E.
A Performance Evaluation of Hyper Text Transfer Protocols,
In Proceedings of ACM SIGMETRICS, May 1999,
pages 188-197.
A very important aspect of your assignment will be your report describing your system architecture, implementation, and high level design decisions. For your performance evaluation, you should include a number of graphs describing the various aspects of system performance outlined above. Make sure to clearly describe how you set up your experiments. What kinds of machines did you run on? What kind of network interconnected the machines? How did you build your load generator and collect statistics? How many times did you run each experiment to ensure statistical significance? Ideally, you will include error bars to indicate standard deviation or 95% confidence intervals. Finally, your writeup should include explicit instructions on how to install, compile, and execute your code.
You will write C++ (or C) code that compiles under the GCC (GNU Compiler Collection) environment. You have to make sure your code will compile and run correctly on the XINU Lab (HAAS 257) and the CS department's Linux machines. You should submit both your server and the client you used to test your server. Please remove all object files and submit only source codes with a make file. Add a readme.txt file describing how to compile and run your program from a terminal. Submit your assignment files, including your report, by the due date, as follows:
1. Place all your assignment files in a folder named pa1
2. Compress the folder as a .zip or tarball so blackboard will allow you to upload it
3. Submit the compressed folder via blackboard for Programming Assignment 1
Please use the discussion group on Blackboard for general questions about the assignment. For specific concerns about the assignment, please send an email to broth@cs.purdue.edu.
Note that the deadline is strict and no extensions will be given.
Please refer to the grading sheet linked here. Students will demonstrate their projects in one of the two PSOs after the due date of the assignment. Slots can be reserved during the PSOs of the week before the demonstrations week. For special arrangements, please email broth@cs.purdue.edu. Students should prepare a 5-10 minute demo based on the grading criteria mentioned above.
Please refer to these links by clicking here.