The objectives of this lab are to monitor Ethernet LAN traffic by capturing and analyzing Ethernet frames, and implement a semi-reliable file transfer app.
Read chapter 3 from Peterson & Davie (textbook).
This problem monitors Ethernet frames on Ethernet interface
veth0 on one of the amber machines in our HAAS G050 lab.
When monitoring/sniffing Ethernet frames, operating systems including Linux
require the interface operate in
promiscuous mode which, in turn, requires superuser privilege. On an amber
machine, run
% sudo /usr/local/etc/tcpdumpwrap-veth0 -c 24 -w - > ethfile-log
which will capture 24 Ethernet frames and save them into ethfile-log.
Enter your password when prompted.
Create subdirectory v1/ under lab3/ and place ethfile-log in v1.
tcpdumpwrap-veth0 is a wrapper of tcpdump that allows sudo execution.
Check the man page of tcpdump for available options.
To generate traffic arriving on veth0, use the ping app from
Problem 2, lab2, with the server running on an amber machine bound to
IP address 192.168.1.1.
The client is executed from the same machine using
% veth 'upingc secret 192.168.1.1 portnum micsec'
where veth
executes upingc at a machine with IP address 192.168.1.2.
Thus the ping client transmits/receives packets on interface 192.168.1.2, and
the ping server
receives/transmits traffic through interface 192.168.1.1. 192.168.1.1
is a private IP address (as are 10.*.*.* addresses of the amber machines)
that is not routable
on the global IP Internet which has been configured for veth0.
192.168.1.2 is the IP address at the opposite
end of veth0, providing an illusion (i.e., virtualization)
as if the two interfaces were connected by a physical point-to-point Ethernet link.
Running
% veth 'ifconfig veth0'
will show the configuration of veth0 at the opposite/remote end
whose IPv4 address is 192.168.1.2 as noted above.
For security reasons, we cannot perform sniffing on eth0 which is the interface of the lab machines, a shared resource. Therefore we use virtual/dummy interfaces in Linux that allows veth0 to be configured as a separate Ethernet interface -- albeit virtual, not physical -- with private IP address 192.168.1.1 that can reach 192.168.1.2, and vice versa. Performing veth at 192.168.1.2 on an amber machine does not execute upingc on a different physical machine equipped with an Ethernet interface veth0 with IP address 192.168.1.2. Instead, both server and client run on the same physical machine, and packet forwarding is handled virtually by Linux as if 192.168.1.2 were a physical Ethernet interface on a separate machine. For our Ethernet frame sniffing and inspection exercise which aims to capture and inspect network traffic, this will suffice.
After capturing the Ethernet frames, analyze ethfile-log using wireshark or tcpdump (both are popular analysis tools). Wireshark (/usr/bin/wireshark), the postcursor of ethereal, is a graphical tool for analyzing (as well as capturing) traffic logs in a format called pcap. Since we will be accessing the data in pcap files through the tools, its internals can be ignored. Use wireshark or tcpdump to inspect the 24 captured Ethernet frames. Using the MAC address associated with 192.168.1.1 (perform 'ifconfig -a' on an amber machine) and the MAC address associated with 192.168.1.2 (perform veth 'ifconfig -a' on the same amber machine), identify the relevant Ethernet frames whose payload are IPv4 packets that, in turn, contain UDP packets generated by our client/server ping app, as payload. Inspect the type field of the captured Ethernet frames to determine if they are DIX (i.e., Ethernet II) frames.
The first 20 bytes of Ethernet payload make up the IP header, and the next 8 bytes the UDP header. The last 8 bytes of the IP header specify the source IP address and the destination IP address. Check their values against the IP addresses used by our ping app. Ignore the other fields of the IPv4 header. The first four bytes of the UDP header specify the source and destination ports. Check that they match the port numbers used by the client and server. Inspect the remaining bytes of the captured Ethernet frames which comprise the application layer payload communicated between client and server by calling sendto(). Wireshark/tcpdump provides output where Ethernet header fields are already decoded and raw payload in hexadecimal/ASCII format. Discuss your findings in lab3ans.pdf.
Note: To run tcpdump, please use the command, % tcpdump -r - < ethfile-log, instead of, % tcpdump -r ethfile-log, which will trigger an "access denied" error. You may also run the command-line version of wireshark, /usr/bin/tshark, instead of tcpdump. To run wireshark, you will have to be physically at a lab machine. If you prefer, you may install wireshark on a Windows, MacOS, or Linux machine, copy ethfile-log to the machine and run wireshark to inspect captured frames.
Implement a UDP-based reliable file transfer protocol, tripf,
in v2/ that is suited for files transported in moderate loss but high bandwidth
network environments. The sender,
tripfs, is executed at a host with arguments
% tripfs <filename> <rcvip> <rcvport> <micsec> <blocksz>
where the first argument is the name of the file to be transported,
rcvip and rcvport are the receiver's IPv4 address (dotted
decimal) and port number, respectively.
The fourth argument specifies a time interval (in unit of
microsecond) that is used to pace the transmission of UDP packets.
The last argument specifies a block size (in unit of byte) that determines
how a file is split into equal-sized blocks (but for the final block if
file size is not a multiple of blocksz)
that are transmitted as separate UDP packets. Thus blocksz is the
portion of the UDP's payload that contains the content of a file.
For simplicity, filename should be exactly 8 characters long (excluding
'\0') comprised of lower-case alphabet characters only.
The receiver, tripfr, is executed as
% tripfr <rcvport>
where rcvport specifies the port number that it binds to in order to
process incoming UDP packets.
Create a subdirectory v2/send/ where the sender is coded. Upon startup tripfs reads the bytes of the file specified as first argument into its main memory. After opening the file, tripfs determines its size and uses malloc() to allocate adequate heap memory. We will commence actual data transmission after the entire file has been read into main memory to reduce the impact of ancillary system overhead such as disk I/O and network I/O. Note that user home directories are mounted onto the local file systems of the lab machines which incurs network overhead through the actions of NFS (Network File System) when a file is read by tripfs. By placing a file in /tmp we can eliminate network I/O overhead but slow-down from disk I/O remains. By first reading the entire file into main memory we can reduce the influence of ancillary system overhead. We cannot, however, eliminate it since the virtual memory subsystem of Linux running on amber machines may, depending on file size and memory pressure, store part of the file in swap space on disk.
After the file has been read into main memory, tripfs transmits three UDP packets containing the same payload to the receiver tripfr. The payload contains character '(' that indicates to the receiver that this is a control/management packet. This is followed by 4 bytes containing an unsigned int that specifies the file size (in unit of byte). The last two bytes specify the block size (in unit of byte). As in lab2, take care of byte order conversion to ensure correctness and, when possible, portability. Consecutive packet transmissions are paced by calling usleep() with argument command-line argument micsec. If micsec is 0, your code should not call usleep() altogether to avoid system call overhead. After transmitting the third management packet, tripfs calls recvfrom() to wait on a response from the receiver. Before calling recvfrom(), an alarm is set to expire after 3 seconds which is caught by a signal handler, void alarmcb(int), coded in v2/sender/alarmcb.c which outputs a message to stdout indicating that the receiver is not reachable, then terminates tripfs. If a response from the receiver arrives before the timer expires, the alarm is cancelled. If recvfrom() returns, the response payload is verified to contain the single byte ')'. If the content is otherwise, a suitable error message is output and the sender terminates.
After setting up a SOCK_DGRAM socket, tripfr calls recvfrom() and waits on file transfer intiation from the sender. When recvfrom() returns, the first byte is verified to be '(', and file size and block size carried by the remaining bytes in the payload are retrieved for later use. If the format of the received payload is invalid, an error message is output to stdout and the receiver terminates. tripfr acknowledges receipt of initiation by transmitting 3 UDP packets with 1-byte payload ')' to the sender. The packets are spaced apart by 100 msec intervals. Before transmission of the first ACK packet, an alarm is set to be raised in 1 second. If the first data packet from the sender does not arrive within the 1 second interval, a suitable message is output to stdout and tripfr terminates. Otherwise, the alarm is cancelled and reception of data packets begins. Code the receiver in subdirectory v2/receive.
After receiving an acknowledgement packet from the receiver (duplicate ack packets are ignored) in response to file transfer initiation in 2.1, data transmission commences. File content read into main memory is copied into the payload of UDP of equal size blocksz but for the last packet which may be smaller if blocksz does not divide file size. The first 4 bytes contain a sequence number (unsigned int) that identifies which part of the file the data portion of the payload contains. Each data block is transmitted thrice containing the same sequence number, 3-fold redundancy being used as a form of forward error correction (FEC). All UDP packets transmitted by tripfs are separated by time interval micsec. Before starting file data transmission the sender calls gettimeofday() and does so again after transmitting the last data packet. tripfs outputs their difference (in unit of msec) to stdout before terminating. The sender does not wait on confirmation from the receiver that the file has been correctly received.
After completing initiation in 2.2, tripfr allocates heap memory to hold the contents of the file to be received. As at the sender, our goal is reduce the influence of disk I/O and other ancillary system overhead when gauging network protocol performance by using main memory to store the received file content. When the first data packet (i.e., sequence number 0) is received gettimeofday() is called, and upon receiving the last data packet (i.e., sequence number file size / block size, rounded up) gettimeofday() is called again. The difference (in unit of msec) is remembered and later output to stdout. Duplicate data packets (i.e., payload containing the same sequence number) are ignored. tripfr monitors if there are missing packets counting how many packets (up to 3) of the same sequence number are received. To do so, a 1-D array, unsigned int *datapack, of maximum segment number (calculated after step 2.2) is allocated using malloc() and initialized to 0. When a data packet containing sequence number k is received, datapack[k] is incremented. After the last data packet has been received, during the postprocessing stage where performance metrics are calculated and the file content is written to disk, datapack[k] is traversed to determine if any data blocks are missing. If so, their sequence numbers are output to stdout along with how many data blocks are missing. The content of the file received is output to a file "received_file" in the current directory. When testing use the diff command to check that the sent and received files are the same.
If the tripfr receives the last expected packet (i.e., contains the calculated maximum sequence number at step 2.2) then it knows that it can proceed to postprocessing which includes writing the file content in main memory to disk. However, if none of the three copies of the last data packet arrive at the receiver, a different mechanism is needed to prevent the receiver from waiting indefinitely. Starting and terminating data transfer sessions correctly is a nontrivial problem which we will discuss when investigating TCP. For lab3, we will implement a working solution (i.e., "hack") where tripfr, upon receiving the first data packet, sets a timer to go off after 7 seconds. In all our tests, we will use file sizes, block sizes, and time intervals between packets such that successful file transfer completion should take less than 7 seconds. Hence, if none of the three final data packets make it to the receiver, a SIGALRM handler will output a suitable message to stdout and terminate tripfr.
To debug, test, and help benchmark the file transfer app, we will introduce
controlled losses at sender-side where some packets may not be transmitted.
When tripfs is executed it reads from a text file, param.dat, an integer followed
by up to
5 pairs of integers. If the first integer is -1 then no packets are
artificially dropped at the sender during file transfer. Otherwise,
the integer is interpreted as specifying how many pairs of integer will follow.
The pairs of integers will be used at tripfs to
selectively discard (i.e., not call
sendto()) data packets. The rule for doing so is illustrated
by the next example. Suppose param.dat contains
3
20 1
105 2
440 1
Assuming file size is comprised of at least 441 data blocks,
the first integer pair specifies that for data packet with sequence
number 20 (sequence numbers start at 0), one of the three copies
is not transmitted. The second pair instructs that for sequence number
105 two of the three duplicates are discarded at the sender. The
last pair specifies that for sequence number 440 one of the three
copies is discarded. Note that if the second value in an integer
pair is 3 then it implies that all three packets for a sequence number
are discarded guaranteeing that correct file transfer will fail.
To implement selective packet drops in a modular fashion, code a function, int dropornot(int), coded in v2/send/dropornot.c, where the argument of dropornot() specifies a sequence number. dropornot() returns 0 if the current packet should be transmitted, 1 if it is to be discarded by not calling sendto(). Use a 2-D integer array, int packselect[5][2], where the pairs of numbers from param.dat are stored and used by dropornot() to determine its return value.
Using files of different sizes, block sizes, and packet spacing intervals, first, test your app to assess correctness. Start with param.dat containing -1 (no packets are discarded at the sender) followed by controlled drops. Note that packet drops can arise naturally in our lab environment where the shared amber machines are connected by Ethernet switches. However, this is unlikely to happen unless traffic load generated across lab machines creates congestion at hot spots. For testing purposes, cap block size at 1400 bytes which, with 4-byte sequence number overhead, is well below the maximum payload size of Ethernet frames. Carry standard correctness checks which includes dropping the last packet. As discussed in class, in general, establishing correctness is the most difficult and time consuming part of software development and testing. We have the luxury of not building production software to be released to the outside world. Hence your tests need only include the usual suspects that most competent software engineers would likely all identify.
After gauging correctness, turn to evaluating performance by varying block size from 600, 1000, 1400 bytes and observing resultant file transfer completion time. Without introducing artificial losses, try optimizing performance by reducing packet spacing and using maximum block size (capped at 1400 bytes). Discuss your findings in lab3ans.pdf.
The Bonus Problem may be approached in the default way by utilizing 802.11 traffic traces provided, or capturing your own WLAN traffic. Use one of the two approaches, not both. In the first approach, you will find three 802.11 frame capture files, m2*.pcap, in the course directory. Use Wireshark to provide a coarse analysis of the captured traffic. Each file contains roughly a 30-second 802.11 traffic trace captured in HAAS. Since two of the files are not small (about 10000-11000 frames), you may choose a subinterval of 1-second duration to analyze the data. If so, please specify for each file which interval you chose. The smaller file contains only about 500 frames. For each file, describe basic features of captured traffic such as which 802.11 network (e.g., 802.11g or 802.11a, frequency band) the frames belong to, basic service sets, types of frames (e.g., beacon, RTS/CTS), data rates (e.g., 6 Mbps indicates that more error correction is applied to protect against noise vs. 54 Mbps), among other factors that may be relevant. The aim is provide a high level characterization of the observed WLAN networks that provide a synopsis of its structure and activities. To earn full credit, describing signal levels (e.g., SNR) of high traffic basic service sets, MAC address of their access points, would be useful to know. In the last part of your analysis, compare your results across the three data sets, highlighting any differences.
The second approach entails capturing your own WLAN trace and providing an analysis similar to above. Capturing 802.11 traffic is not straightforward and system dependent. For example, WLAN drivers in specific Linux, Windows, MacOS operating systems for specific 802.11 interfaces may not support monitor mode (analogous to promiscuous mode for WLANs). Even when supported, the driver may not export captured 802.11 frames but 802.3 frames that carry 802.11's payload but otherwise discards 802.11 specific frame information. The second approach is meaningful only if you have access to a device where you have a root (superuser, administrator) account and the system allows capture of 802.11 frames that may be inspected using Wireshark to analyze traffic. If you are following the second approach, please specify the system environment (OS, 802.11 interface/card, driver, when/where traffic was captured) in addition to providing your analysis. Please keep in mind that you may -- after spending time to research -- determine that your system cannot capture raw 802.11 frames, or that doing so entails significant time investment to configure your environment (possibly involving coding) to enable 802.11 frame capture. If you are interested in Wi-Fi and wireless systems in general, the time spent may yield productive insight, but, otherwise, following the first approach is straightforward and recommended.
The Bonus Problem is completely optional. It serves to provide additional exercises to understand material. Bonus problems help more readily reach the 40% contributed by lab component to the course grade.
Electronic turn-in instructions:
i) For problems that require answering/explaining questions, submit a write-up as a pdf file called lab3ans.pdf. Place lab3ans.pdf in your directory lab3/. You can use your favorite editor subject to that it is able to export pdf files which several freeware editors do. Files submitted in any other format will not be graded.
ii) We will use turnin to manage lab assignment submissions. Please check that the relevant source code including Makefile are included in the relevant subdirectories of lab3. In the parent directory of lab3, run the command
turnin -c cs422 -p lab3 lab3
You can check/list the submitted files using
turnin -c cs422 -p lab3 -v