CS 536 Spring 2024

Lab 3: Traffic Monitoring, Remote Command Server, Fast and Reliable File Transport [310 pts]

Due: 2/28/2024 (Wed.), 11:59 PM

Objective

The objectives of this lab are to monitor Ethernet LAN traffic by capturing and analyzing Ethernet frames, implement a network remote command server using TCP, and modify the fast but unreliable file transfer app of lab2 to support reliable transport using negative ACK of small files.


Reading

Read chapter 3 from Peterson & Davie (textbook).


Problem 1 [70 pts]

1.1 System set-up, traffic generation, and capture

In this problem, you will sniff Ethernet frames on an Ethernet interface veth0 on one of the amber machines in our lab. When sniffing Ethernet frames, operating systems including Linux require the interface operate in promiscuous mode which requires superuser privilege. On an amber machine, run

% sudo /usr/local/etc/tcpdumpwrap-veth0 -c 24 -w - > testlogfile

which will capture 24 Ethernet frames and save them into testlogfile. Enter your password when prompted. tcpdumpwrap-veth0 is a wrapper of tcpdump that allows sudo execution. Check the man page of tcpdump for available options. To generate traffic arriving on veth0, use the file transfer app from Problem 3, lab2, with server ssftpd running on an amber machine bound to IP address 192.168.1.1. The client, ssftp, is executed from the same machine using

% veth 'ssftp filename 192.168.1.1 srv-port 192.168.1.2 payload-size'

where veth executes ssftp at a machine with IP address 192.168.1.2. Thus the client transmits/receives packets on interface 192.168.1.2, and the server transmits/receives traffic through interface 192.168.1.1. 192.168.1.1 is a private IP address that is not routable on the global IP Internet which has been configured for veth0. 192.168.1.2 is the IP address at the opposite end of veth0 as if the two interfaces were connected by a point-to-point Ethernet link.

For security reasons, we cannot perform sniffing on eth0 which is the interface through which the lab machines, a shared resource, are connected to the Internet. Therefore we use virtual/dummy interfaces in Linux that allows veth0 to be configured as a separate Ethernet interface -- albeit virtual, not physical -- with private IP address 192.168.1.1 that can reach 192.168.1.2, and vice versa. Thus performing veth at 192.168.1.2 on an amber machine does not execute ssftp on a different physical machine equipped with an Ethernet interface veth0 with IP address 192.168.1.2. Instead, both server ssftpd and client ssftp run on the same physical machine, and packet forwarding is handled virtually by Linux as if 192.168.1.2 were a physical Ethernet interface on a separate machine. For our Ethernet frame sniffing and inspection exercise, this will suffice.

1.2 Traffic analysis

Use test file and payload size so that at least 24 Ethernet frames are generated by the file transfer client/server app which will be captured by tcpdumpwrap-veth0. After doing so, analyze testlogfile using wireshark or tcpdump (tcpdump is also an analysis tool). Wireshark (/usr/bin/wireshark-gtk), the postcursor of ethereal, is a popular graphical tool for analyzing (as well as capturing) traffic logs in pcap format. Use wireshark or tcpdump to inspect the 24 captured Ethernet frames. Using the MAC address associated with 192.168.1.1 (perform ifconfig -a on an amber machine) and the MAC address associated with 192.168.1.2 (perform veth 'ifconfig -a' on the same amber machine), identify the relevant Ethernet frames whose payload are IPv4 packets that, in turn, contain UDP packets generated by the client/server app, as payload. Inspect the type field of the captured Ethernet frames to determine if they are DIX (i.e., Ethernet II) frames.

The first 20 bytes of Ethernet payload comprise IP header and the next 8 bytes the UDP header. The last 8 bytes of the IP header specify the source IP address and the destination IP address. Check their values against the IP addresses used by the file transfer app. The first four bytes of the UDP header specify the source and destination ports. Check that they match the port numbers used by the client/server. Inspect the remaining bytes of the captured Ethernet frames which comprise the application layer payload communicated between client and server by calling sendto(). Wireshark/tcpdump will provide output where Ethernet header fields are already decoded. Inspect the captured raw data in hexadecimal form to match the IPv4 addresses and UDP port numbers. Use wireshark/tcpdump as a confirmation tool. Do the same when analyzing the application layer payload carried by the Ethernet frames. Discuss your findings in lab3.pdf.

Note: To run tcpdump, please use the command, % tcpdump -r - < testlogfile, instead of, % tcpdump -r testlogfile, which will trigger an "access denied" error. You may also run the command-line version of wireshark, /usr/bin/tshark, instead of tcpdump. To run wireshark, you will have to be physically at a lab machine. If you prefer, you may install wireshark on a Windows, MacOS, or Linux machine, copy testlogfile to the machine and run wireshark to inspect captured frames.


Problem 2 [120 pts]

Modify Problem 2 of lab1 so that server and client run on different machines in the lab and use stream sockets SOCK_STREAM to communicate in place of FIFOs.

2.1 General background

Stream sockets uses TCP (Transmission Control Protocol) which implements sliding window ARQ to achieve reliable data communication. Sockets of type SOCK_STREAM export a byte stream abstraction to app programmers where a sequence (or stream) of bytes sent by a sender using system call write() is received as a sequence of bytes in the same order and without "holes" by the receiver when calling read(). Thus the operating system shields the app from having to deal with the consequences of unreliable communication networks which applies to most real-world networks today. In contrast to SOCK_DGRAM sockets implementing UDP where payload carried by one packet is not part of a stream and data transport is unreliable, SOCK_STREAM sockets maintain a notion of persistent state referred to as a connection between sender and receiver which is needed to implement sliding window. Other aspects of TCP such as congestion control that require a connection between sender and receiver will be covered under transport protocols.

SOCK_STREAM is inherently more overhead prone than SOCK_DGRAM which is reflected in how a connection between sender and receiver is set up before communication can commence using write() and read() system calls. SOCK_STREAM sockets are well-suited for implementing concurrent client/server apps including file and web servers. A server calls socket() to allocate a SOCK_STREAM socket descriptor followed by bind() analogous to SOCK_DGRAM in lab2. After bind(), the server calls listen() to mark the socket descriptor as passive, meaning that a server waits on connection requests from clients. The second argument of listen() specifies how many connection requests are allowed pending. For our purposes, 5 will do. Following listen(), the server calls accept() which blocks until a client connection request arrives. When a client request arrives, accept() returns a new socket descriptor that can be used to communicate with the client while the old socket descriptor (the first argument of socket()) is left intact so that it can be re-used to accept further connection requests. The new socket descriptor returned by accept() is called a full association which is a 5-tuple

(SOCK_STREAM, server IP address, server port number, client IP address, client port number)

that specifies the protocol type (SOCK_STREAM or TCP), server and client coordinates. The original socket descriptor is called a half association since the client IP and port number remain unspecified so that the descriptor can be re-used to establish a new connection upon calling accept().

On the client side, instead of calling bind() a client calls connect() with the server's IP address and port number. The operating system will fill in the client's IP address and port number with an an unused ephemeral number. If the client is multi-homed and wants to use a specific network interface to send/receive data, or wants to use a specific port number, then bind() can be used to do so. By default, bind() is not needed on the client side due to the actions of connect(). For typical clients, connect() is used which obviates the need to call bind(). When connect() returns, a client can send its request using write() analogous to Problem 3, lab1.

2.2 Implementation details

On the server side, a difference from Problem 2, lab1, is that client requests do not share a common FIFO queue but are transmitted through separate full association SOCK_STREAM sockets. That is, there exists a pairwise connection between a specific client and the shared server. An additional functional feature to add to the server is that upon receiving a client request, if execvp() fails the server informs the client by sending a message comprised of the string "command failed" (inclusive end-of-string character '\0'). If the command succeeds execvp() does not return, hence no response is sent to the client by the child process. After sending a request to the server, a client blocks on read(). Before calling read(), the client registers a signal handler (i.e., callback function) for SIGALRM using signal(), then sets an alarm for 0.5 seconds by calling ualarm(). If the blocking read() system call returns before the timer expires, the client outputs to stdout the message received from the server, then terminates by calling exit(). If the timer expires, the signal handler calls exit() to terminate the client. Hence no message from the server indicates that the command has been successfully executed.

Problem 2 implements a remote command server which presents a security vulnerability that must be guarded against. Our first defense is to inspect the IPv4 source address of the client and check that the 3-byte prefix is "128.10.112" where prefix "128.10" signifies one of Purdue's IP address to the outside world and "112" specifies the LAN that the amber machines are connected to. Source address filtering is imperfect since forging of the source address --- called spoofing --- is a technique that attackers (and sometimes for valid purposes by apps) may utilize. Gateway routers at enterprise networks may, therefore, implement source address filtering such as not allowing IP packets with certain source addresses to pass through. In wireless LANs, routers may be configured to reject entry of 802.11 frames with source MAC addresses not belonging to the basic service set (BSS) as a layer of defense. Our second defense is to restrict commands to "ls" (with at most three arguments, each not longer than 2 bytes) and "date". Any other request is ignored without responding with the "command failed" message.

Implement your client/server app in a modular fashion and provide a Makefile in v1/ to compile and generate remotecmd.bin and remotecmdc.bin. Create README under v1/ that specifies the files and functions of your code, and a brief description of their roles. Test your client/server app with multiple client processes and verify correctness. Even with the protection measures in place, except when testing do not keep the server running.


Problem 3 [120 pts]

A continuation of Problem 3, lab2, modify the client/server file transfer app so that reliable file transfer is provided. To do so, implement a form of selective negative ACK where the client, ssftp, upon detecting a missing packet sends a negative ACK comprised of a 1-byte payload containing the sequence number of the missing packet.

At the server, ssftpd, after responding to a client requesting a file transfer register a SIGPOLL (or SIGIO) signal handler that is invoked by the operating system when a new packet arrives. Use this asynchronous signal handler to retransmit missing packets. Since 256000 byte maximum file size constraint allows the server to keep 1000-byte blocks (but for the last block) of a file in main memory, it is straightforward for the server to retransmit requested blocks. Until completion of reliable file transfer the signal handler ignores any packets that request a new file transfer. Upon completion of the file transfer, ssftpd unregisters the signal handler by resetting SIGPOLL to its default disposition.

At the client, ssftp, use a SIGALRM handler that is invoked every 100 msec to check if there are missing blocks, and, if so, send a 1-byte UDP packet to the server containing the sequence number of the missing packet to request retransmission. Thus data packets from the server are processed by the synchronous part of the client code that blocks on recvfrom() while the communication of retransmission requests is handled by the asynchronous part of the client driven by periodic 100 msec alarms. The client maintains two variables: (i) a sequence number, unsigned short seqmax, that specifies the highest sequence number of a data packet received plus 1, (ii) a sequence number, unsigned short seqmin, that specifies the lowest sequence number (strictly) below which all data blocks have been received. For example, at the start of a file transfer session seqmin = seqmax = 0. If a packet with sequence number 0 is received, the two indices are updated to seqmin = seqmax = 1 specifying that all blocks below sequence number 1 have been received and the highest sequence number of any data block received is 0. If no packets are dropped, seqmin and seqmax will be incremented in tandem.

If a gap develops, e.g., seqmin = 5 and seqmax = 9, it indicates that all packets up to (and inclusive) 4 have been received, as has packet 8. If the data structure at the client used to keep track of data blocks received indicates that packet 6 has been received, then the status of packets 5 and 7 are unknown. Therefore the asynchronous component of the client retransmits packets 5 and 7 which considers the worst-case scenario that both 5 and 7 have been lost. The two variables seqmin and seqmax are updated by the synchronous code that calls recvfrom(). Since all data blocks are kept in main memory until file transfer is complete, implementing selective negative ACK is straightforward.

The client detects file transfer completion when seqmin exceeds file size. The server will conclude that file transfer has concluded if after sending the last data packet no retransmission request is received for 1.5 seconds. If a retransmission request is received, a new 1.5 second timer is set to determine file transfer termination. Note that this is an ad hoc heuristic that relies on normal communication not incurring latency greater than about 0.75 seconds. As such, it is buggy and imperfect. When we discuss TCP will find that this is an intrinsic problem shared by all reliable transport protocols, including TCP, which implements a similar ad hoc rule for determining file transfer completion.

Implement your modified client/server app in in v2/. Please provide Makefile and README. When debugging reliable transport, utilize control over packet drops by inserting test conditions in the server that prompts it not to transmit a data packet or ignore a retransmission request. After establishing correctness, perform the benchmark runs of Problem 3, lab2, and compare the results. Did any packet drop occur without artificial intervention, and, if so, quantify the slow-down experienced by the reliable file transfer app compared to its unreliable counterpart. Discuss your finding in lab3.pdf.


Bonus problem [30 pts]

The Bonus Problem may be approached in the default way by utilizing 802.11 traffic traces provided, or capturing your own WLAN traffic. In the default approach, you will find three 802.11 frame capture files, m2*.pcap, in the course directory. Use Wireshark to provide a coarse analysis of the captured traffic. Each file contains roughly a 30-second walltime 802.11 traffic trace captured at HAAS. Since two of the files are not small (about 10000-11000 frames), you may choose a subinternal of 0.5-second duration to analyze the data. If so, please specify for each file which interval you chose. The smaller file contains only about 500 frames. For each file, describe basic features of captured traffic such as which 802.11 network (e.g., 802.11g or 802.11a, frequency band) the frames belong to, basic service sets, types of frames (e.g., beacon, RTS/CTS), data rates (e.g., 6 Mbps indicates that more error correction is applied to protect against noise vs. 54 Mbps), among other factors that may be relevant. The aim is provide a high level characterization of the observed WLAN networks that provide a synopsis of its structure and activities. To earn full credit, describing signal levels (e.g., SNR) of high traffic basic service sets, MAC address of their access points, would be useful to know. In the last part of your analysis, compare your results across the three data sets, highlighting any differences.

The second approach entails capturing your own WLAN trace and providing an analysis similar to above. Capturing 802.11 traffic is not straightforward and system dependent. For example, WLAN drivers in specific Linux, Windows, MacOS operating systems for specific 802.11 interfaces may not support monitor mode (analogous to promiscuous mode for WLANs). Even when supported, the driver may not export captured 802.11 frames but 802.3 frames that carry 802.11's payload but otherwise discards 802.11 specific frame information. The second approach is meaningful only if you have access to a device where you have a root (superuser, administrator) account and the system allows capture of 802.11 frames that may be inspected using Wireshark to analyze traffic. If you are following the second approach, please specify the system environment (OS, 802.11 interface/card, driver, when/where traffic was captured) in addition to providing your analysis. If considering the second approach, please keep in mind that you may -- after spending time to research -- determine that your system cannot provide raw 802.11 frames, or that doing so entails significant time investment to configure your environment (possibly involving coding) to enable 802.11 frame capture. If you are interested in Wi-Fi and wireless systems in general, the time spent may yield productive insight, but, otherwise, following the first approach is recommended.

The Bonus Problem is completely optional. It serves to provide additional exercises to understand material. Bonus problems help more readily reach the 45% contributed by lab component to the course grade.


Turn-in Instructions

Electronic turn-in instructions:

We will use turnin to manage lab assignment submissions. Go to the parent directory of the directory lab3/ where you deposited the submissions and type the command

turnin -c cs536 -p lab3 lab3

You can check/list the submitted files using

turnin -c cs536 -p lab3 -v

This lab is individual effort. Please note the assignment submission policy specified on the course home page.


Back to the CS 536 web page