CS 422

Lab 1: Building a HTTP client

Purpose of the Lab

The objective of the lab is to create a HTTP client called webdump that will allow a user to connect to an internet web server and retrieve a file.

Pre-reading for this Lab

In chapter 24 and 25 of the textbook you will find an explanation of the different socket calls as well as an example of a client program. It is important that you read these chapters. Pay attention to the following functions in the socket s API: getservbyname, gethostbyname, getprotobyname, socket and connect. Also note how data is read from and written to a socket by calls to read and write. Your PSO instructor will give an introduction of the socket calls and explain the HTTP client.
 

Running the example programs

Download and untar the fiiles in lab1-tar.Z.
uncompres lab1-src.tar.Z
tar -xvf lab1-src.tar
The directory lab1-src contains several files that you will use as base for the webdump program you will implement in this lab.

cd to lab1-src and type "make". You will see two executables: "client" and "server". The server is a time server that can be accesssed by the client provided. First choose a port number for the time server.

Run the time server in one window:

server <port-of-your-choice>
In another window run the client:
client localhost <port-of-your-choice>
Make sure you use the same port for both the client and the server. The previous commands assume that you are running both the client and the server in the same machine. If you run the client in a different machine, you will need to specify the name of the host where the server is running instead of "localhost".

Description

You will use the client.cpp program as base for the webdump program you will implement. Copy the client.cpp to webdump.cpp before you start implementing your webdump program. Also, add the necessary rules to the Makefile to build your webdump program.

You will write a command:

webdump <URL>
that will request the document described in URL.

A URL has the form:

http://<host>[:port][/file]
Webdump will parse the URL to obtain the <host>, <port>, and <file>. If the URL does not have the syntax described webdump will print an error.

If <port> is not specified, the default port number will be 80. If file is not specified the default file will be "/".

Overview

The HTTP protocol works as follows. To retrieve a URL, a HTTP client (i.e. a web browser) first connects to the HTTP server and then sends a GET request to the server, specifying the URL in the request. In addition, it also provides other information (e.g. make/version of the browser etc.) to the server in the GET request. However, a simple client like the one you are going to build need not worry about most of these headers. The following shows a simple GET request that could be issued to a server (after establishing a connection with the server):
GET <sp> /SomePath/SomePage.html <sp> HTTP/1.0 <crlf> <crlf>

where : <sp> stands for a whitespace character and,
        <crlf> stands for a carriage return-linefeed pair i.e. a carriage return (ascii character 13)
               followed by a linefeed (ascii character 10).
 

The sequence <crlf> can be represented as the string sequence "\015\012"

The above request assumes that a connection has been established with a server (for example, the server could be "www.SomeServer.com").

To retrieve the default page from the server instead of specifying a page, issue the following command (Note the '/' in the request):

GET <sp> / <sp> HTTP/1.0 <crlf> <crlf>

Procedure and Details

Read chapters 24 and 25 of your textbook.Using the same general approach as in the example client found in chapter 25, implement a HTTP client that connects to a given server and retrieves a specified URL.

Deadline

This lab is due  Wednesday January 28th at 11:59pm. Write your files in a directory lab1-src and make sure that webdump can be built by typing "make".

To turnin your project type

turnin -c cs422 -p lab1 lab1-src
Reading and References

[1] Chapter 25 in `Computer Networks and Internets' by Douglas E. Comer - "Example of a client and a server".

[2] Chapter 07 in `Internetworking with TCP/IP - Vol 3' by Douglas E. Comer and David L. Stevens - "Example client software".

[3] RFC 1945 defines the HTTP 1.0 protocol . You can access this by typing `rfc 1945' on your console.