CS 536 Spring 2024

Lab 2: Basic Socket Programming and Lightweight Network Communication [270 pts]

Due: 2/14/2024 (Wed), 11:59 PM

Objective

The objective of this lab is to practice basic network programming using UDP datagram sockets. We will also gauge performance of network client/server apps that implement application layer pinging and lightweight file transfer.


Reading

Read chapter 2 from Peterson & Davie (textbook).


Problem 1 [120 pts]

Implement an application layer ping client/server using datagram sockets (SOCK_DGRAM) that allows a client, pingc, to check if the ping server, pings, is responding on a remote host, and, if so, estimate RTT (round-trip time). A socket is a type of file descriptor (note that Linux/UNIX distinguishes 7 file types) which is primarily used for sending data between processes running on different hosts. The hosts may run different operating systems (e.g., Linux, Windows, MacOS) on varied hardware platforms (e.g., x86 and ARM CPUs). Our lab machines run Linux on x86 PCs. Datagram sockets facilitate low overhead communication by not supporting reliability. That is, a message sent from a process running on one host to a process running on a remote host may not be received. It is up to the application to deal with resultant consequences.

1.1 General background

A socket can be of several types. In this problem, you will use datagram (SOCK_DGRAM) sockets that invoke the UDP (User Datagram Protocol) protocol implemented inside operating systems such as Linux. That is, SOCK_DGRAM software executes as part of kernel code in kernel mode. User processes invoke system calls to enter kernel mode to run datagram socket code. We will discuss the inner workings of UDP when we study transport protocols.

In the same way that FIFOs were utilized as an abstract communication primitive to achieve inter-process communication in Problem 2, lab1, without understanding how FIFOs are implemented in kernel code, we will utilize datagram sockets as an abstract communication primitive to facilitate communication between processes running on different hosts. As noted in class, to identify a process running on an end system (e.g., PC, server, router, smartphone) under the governance of an operating system we need to know how to reach the end system -- IP (Internet Protocol) address of one its network interfaces -- and which process we wish to talk to on the end system specified by its port number. A port number is a 16-bit non-negative integer that is bound to the PID (process ID) of a process to serve as its alias for network communication purposes.

Port numbers 0-1023, called well-known port numbers, cannot be used by app programs. Port numbers 1024-49151, referred to as registered port numbers, are usable by app processes. However, it is considered good practice to avoid doing so to the extent feasible. In many instances, networked client/server apps -- the same goes for peer-to-peer apps which are just symmetric client/server apps where a host acts both as server and client -- are coded so that they do not depend on specific port numbers. This facilitates portability and robustness. The host on which a destination process runs is identified by an IP address. Although IPv6 (version 6) with 128-bit addresses is partially deployed and used, IPv4 (version 4) with 32-bit addresses remains the dominant protocol of the global Internet. By default, we will use IPv4 addresses. Although we loosely say that an IP address identifies a host, as noted in class, an IP address identifies a network interface on an end system. Hosts that have multiple network interfaces (e.g., a smartphone has WiFi, Bluetooth, cellular, among other interfaces) may have multiple IP addresses, one per network interface. Such hosts are called multi-homed (vs. single-homed). A network interface need not be configured with an IP address if there is no need to speak IP.

Most network interfaces have unique 48-bit hardware addresses, also called MAC (medium access control) addresses, with IP addresses serving as aliases in an internetwork speaking IP. In socket programming, we use IP addresses, not MAC addresses to facilitate communication between processes running on different hosts. IP addresses are translated to MAC addresses by operating systems before delivery over a wired/wireless link. Ultimately all communication is carried out over LANs where end system and switch interfaces are identified by MAC addresses. IP addresses are meaningless. We will study the role and inner working of IP when discussing network layer protocols.

To facilitate human readability of IP addresses, further abstraction is implemented in the form of domain names. For example, www.cs.purdue.edu is mapped to IPv4 address 128.10.19.120 where the four decimal numbers specify the four byte values of a 32-bit address in human readable form. This translation operation is carried out with the help of a distributed database system called DNS (Domain Name System). Details of DNS and other higher layer protocols are discussed later in the course.

1.2 Implementation details

User interface The ping server, pings, invokes system call socket() to allocate a file descriptor (i.e., socket descriptor) of type SOCK_DGRAM. A socket descriptor is but a handle and must be further configured to specify who the communicating parties are, and possibly other properties. After socket(), bind() is called to bind the server to an IP address and an unused port number on its host. Use the Linux command, ifconfig -a, to determine the IPv4 address following the dotted decimal notation assigned to Ethernet interface eth0 on our lab machines (e.g., 128.10.112.135 on amber05.cs.purdue.edu in HAAS G50). Provide the IPv4 address in dotted decimal notation (a string) as command-line input of your server process along with a port number to use:

% pings 128.10.112.135 44444

If the specified port number is already being used, bind() will return an error. If so, increment the port number and call bind() to make another attempt. Keep doing so until bind() succeeds or the number of attempts exceeds 10. If bind() succeeds outout the port number on stdout. In bind() fails after 10 attempts, output a message to stdout indicating failure and terminate the server process.

The client, pingc, is executed on a different host with command line arguments that specify the server's coordinate. In addition, the client specifies an IP address of its network interface to use for network communication. For example, amber09.cs.purdue.edu in HAAS G050 is configured with Ethernet interface eth0 with IPv4 address 128.10.112.139, hence running

% pingc 128.10.112.135 44444 128.10.112.139

on the client host specifies that the client should use 128.10.112.139 as its IPv4 address and communicate with an application layer ping server at 128.10.112.135:44444. When specifying the server's port number, use the number output by pings on stdout. For the client's port number, instead of specifying what port number to use specify 0 as the port number which delegates the task of finding an unused port number to the operating system. The kernel will allocate an unused port number, called ephemeral port number, if available. Note that for many client/server application what port number is not relevant. The port number used by a client process is communicated as meta data (i.e., header part) of a UDP packet, hence the server knows what port number to use to reach the client.

Client The client, pingc, after creating a SOCK_DGRAM socket and binding to an IP address and port number, creates a UDP packet containing a 100 byte payload which it sends to the server using the sendto() system call. The first 2 bytes contain a short integer -- a sequence number -- that identifies the client request. The value of the third byte is a control message that commands the server what to do. If its value is 0 it means that the server should respond immediately by sending a UDP packet to the client with a 100-byte payload whose content is a copy of the payload received. If its value is 1, it means that the server should delay sending the response by 555 msec. If the value is 2, then the server should ignore the client's packet and not respond.

Before sending a ping request, the client calls gettimeofday() to record the time stamp just before the packet is sent. Upon receiving a response from the server, the client calls gettimeofday() to take a second time stamp. The client checks the sequence number inscribed in the packet to verify that it matches the sequence number of the packet it transmitted. If they do not match, the client outputs a message to stdout indicating that sequence numbers do not match. If the sequence numbers match, the client calculates the difference of the two time stamps and outputs the value to stdout in unit of millisecond (msec).

Server After binding to an IP address and port number, pings calls recvfrom() and blocks on client requests. When a request arrives, the server examines the first 3 bytes of the 100-byte payload to determine what to do. The server process behaves as an iterative server and performs the task itself. That is, unlike Problem 2, lab1, where the server forked a child process to carry out requested work, pings performs the requested task itself. After responding to the client, the server goes back to blocking on recvfrom(). If the command is invalid, i.e., the third byte is not 0, 1 or 2, pings outputs a message to stdout specifying the client's IP address in dotted decimal form, port number, and the value of the third byte. Then the server goes back to blocking on recvfrom().

1.3 Implementation and correctness testing

Implement your application layer ping app in a modular fashion and provide a Makefile in v1/ to generate pings and pingc. Create README under v1/ that specifies the files and functions of your code, and a brief description of their roles. When sending and receiving 100-byte messages, note that the x86 Linux PCs in our labs use little endian byte ordering whereas Ethernet follows big endian. When dealing with network software it is necessary to determine whether big endian representation of network data needs to be normalized by code running at end systems. Test your ping app to verify correctness where correctness means that your client/server app behaves according to the design specification.


Problem 2 [30 pts]

A continuation of Problem 1, benchmark performance of the application layer ping app to glean information about overhead introduced by application layer implementation of ping service and delays incurred by packets within Purdue CS's LANs. The first and third parts are regular problems each counting 15 points. The second part is part of the Bonus Problem counting as additional 15 points.

First, using pingc/pings gauge the round-trip time of UDP packets between Linux PCs in our lab HAAS G50. Set the third byte of the payload to 0. Then use the legacy app, ping (/bin/ping), which runs directly over IP without using UDP (hence app processes are not involved) to estimate RTTs (round-trip times). Compare the results. Discuss your results and findings in lab2.pdf.

Second (part of Bonus Problem), use other machines in CS (not the amber PCs in our lab), Purdue campus, or outside (e.g., your laptop or PC at home, coffee shop with Wi-Fi connection) to compare RTTs obtained by your application layer ping app to the values obtained by the app within our lab machines. Specify the location and relevant details (location, LAN used to connect to global IP Internet). Discuss your results and findings.

Third, find five machines located on the east coast, west coast, somewhere in eastern Asia, somewhere in Europe, and another continent or distant place from Purdue that respond to ping. Specify their location and domain names (e.g., www.ucsd.edu). Note that some domain names may appear to be at a faraway location but due to services contracted from global service providers the responding system may be physically elsewhere, typically much closer to the location of the client. Universities tend to run their own services. One way to assess what route packets are taking over the Internet is to use the app traceroute (/usr/bin/traceroute) which may provide partial clues by outputting names of routers that packets traverse. We will examine the technique used by traceroute when discussing IP. Try to get a rough estimate of the physical distance from Purdue to the target site. Use SOL to get latency estimates and compare against the ping values. By default, divide RTT by 2 to get a one-way delay estimate. Note that, in general, Internet routes are not symmetric, hence accurate one-way delay estimates are intrinsically difficult to obtain without using synchronized hardware clocks. Discuss your results and findings.


Problem 3 [120 pts]

3.1 Motivation

Use the code of Problem 1 as a starting point to implement a simple file transfer protocol for small files with the caveat that it does not assure reliable transport. That is, the received file may not be an exact copy of the original file. Our aim is to trade-off efficiency for reliability when it may be meaningful to do so. The app, ssftp (small size file transfer protocol), saves the transferred file in a file, received.file, in /tmp of the local file system of a lab machine where ssftp is executed if file transfer was successful. It outputs to stdout the size of the file in bytes, file transfer completion time (in msec), and transfer speed in bps (file size in bits divided by completion time). If reliable file transfer was unsuccessful ssftp outputs to stdout a suitable message along with the percentage of bytes not received and the sequence number of missing packets. These two values are output if available.

3.2 Client/server design

We will implement the app as an iterative client/server app where the server, ssftpd, waits for a client request, performs the request, then attends to the next request.

Client The client, ssftp, is executed with command-line arguments that specify the server's IPv4 address in dotted decimal form, server's port number, and filename (for simplicity a string comprised of lower-case characters a to z). For example,

% ssftp testfile 128.10.112.135 50001 128.10.112.133 1000

where testfile specifies the name of the file (its content can be ASCII or binary), 128.10.112.135 and 50001 specify the IP address/port number of the file server, 128.10.112.133 specifies the client's IP address, and 1000 specifies the payload size of data packets (in bytes) that carry the content of the requested file. The client's port number remains ephemeral. We will limit the length of filename to 10 characters. Upon execution ssftp sends a UDP packet to the server with payload comprised of 10 bytes. If the filename is shorter than 10 character pad the excess bytes with ASCII character 'Z' so that the server can decode the requested filename. Before sending the UDP packet, set an alarm (i.e., signal SIGALRM) to be generated 500 msec in the future. If no response arrives from the server and the alarm event is raised, output a suitable message to stdout and terminate the client. If a response arrives, cancel the alarm.

The first UDP packet arriving from the server will contain file size (in bytes) in payload of 3 bytes. If the first packet received is not of size 3 bytes, the client outputs a suitable message to stdout before terminating. Subsequent UDP packets arriving from the server will contain a sequence number in the first byte followed by 1000 bytes of file content. The maximum file size allowed is 256000 bytes. The sequence numbers inscribed by the server will follow 0, 1, ..., N where N less than or equal to 255. Knowing file size, the client can calculate N and payload size of the last packet which may contain less than 1001 bytes.

The client uses blocking recvfrom() to wait for UDP packets from the server, ssftpd. When a data packet containing file content is received, the payload is saved in main memory in an array. When the last data packet is received, the array is saved to local file under /tmp. Writing payload of data packets to disk as they arrive will likely make disk I/O a bottleneck that significantly slows down performance. Flushing the file content to a local file in /tmp prevents NFS (Network File System) from affecting performance. Note that your home directories are mounted to local file systems on the lab PCs and mediated by NFS. For small file sizes we can afford to save the received file in main memory before flushing to disk at the end.

Server The server, ssftpd, calls recvfrom() to block on a client requeset. When a packet of payload size 10 bytes containing a filename is received, the server checks that the file exists in /tmp. If the file does not exist, the server ignores the request and calls recvfrom() to wait for the next client request. If the file exists, the server sends a UDP packet of payload 3 bytes specifying the size of the file (in bytes). The server is executed with command-line arguments

% ssftpd 128.10.112.135 50001

where the first two arguments specify the IP address and port number of the server. To reduce the influence of file I/O, as in the client, ssftpd will first read the entire requested file into main memory in an array. Then the server will use sendto() to transmit successive 1000 byte file data (but for the last packet which may be smaller) prepended by a 1-byte sequence number to the client. After transmitting the last packet, ssftpd waits for the next client request by blocking on recvfrom().

3.3 Performance monitoring

To measure completion time (in msec) and transfer speed (bps) the client calls gettimeofday() before transmitting a request to the server to remember the start time. The client calls gettimeofday() again after receiving the last packet and completing disk I/O to /tmp. Completion time is calculated by subtracting the start time and outputting the value (in msec) to stdout. The client divides file size (in bits) by completion time (in unit of second) to output average file transfer speed (in bps). For small files disk I/O may constitute a significant part of total file transfer performance. To isolate the network performance component we may take the start time stamp after receiving the first packet from the receiver and the end time stamp upon receiving the last data packet from the receiver. In this problem we will bias toward overall file transfer application performance. When the client receives a data packet make a note so that if one or more packets are lost ssftp outputs the sequence numbers of missing packets as well as percentage of bytes not received.

3.4 Client termination issue

The client/server app has a flaw in that the client will hang indefinitely if the last data packet from the server is lost. Discuss in lab2.pdf how you would resolve this issue. There is no need to implement it.

3.5 Performance evaluation

Implement your code in v2/ along with Makefile and README as in Problem 1. Check that your app works correctly on two machines in the lab. Benchmark for three file sizes: 2560, 25600, 256000 bytes. Discuss your results in lab2.pdf. Repeat the performance evaluation on the three file sizes but with the payload size increased from 1000 to 1400 bytes. Discuss your finding.


Bonus problem [30 pts]

The first component of the Bonus Problem is the second part of Problem 2 (15 points). The second component of the Bonus Problem involves visiting the PSOs to measure and interpret power levels of single-mode optical fiber of different lengths using an optical power meter as demonstrated in class. This counts as 15 points. Describe in lab2.pdf the results and meaning of performing power measurements for a short single-mode cable (with SC to LC connector) and an extended single-mode wiring (using SC to SC coupler) where a 10-times longer single-mode SC to SC cable is attached to the short cable using SC to SC coupler. Note that absolute power levels in dBm (millidecibel) are relative to 1 milliwatt in the denominator as reference. Hence 0 dBm means received signal strength is 1 mW, negative values mean less than 1 mW. By comparing received signal strength of short and long wire segments approximate attenuation may be estimated. The wavelength emitted by the media converter is 1310 nm. Find out what are ballpark expected losses for single mode fiber carrying 1310 nm waves over typical distance specifications. How does your measurement for the single-mode links compare? Note that coupling cable segments can contribute significant signal degradation.

The Bonus Problem is completely optional. It serves to provide additional exercises to understand material. Bonus problems help more readily reach the 45% contributed by lab component to the course grade.


Turn-in instructions

Electronic turn-in instructions:

We will use turnin to manage lab assignment submissions. Go to the parent directory of the directory lab2/ where you deposited the submissions and type the command

turnin -c cs536 -p lab2 lab2

You can check/list the submitted files using

turnin -c cs536 -p lab2 -v

This lab is individual effort. Please note the assignment submission policy specified on the course home page.


Back to the CS 536 web page