CS 354 Spring 2021

Lab 3: Creation/Termination and Context-switching, Monitoring, and Dynamic Scheduling of Processes (300 pts)

Due: 03/24/2021 (Wed.), 11:59 PM

1. Objectives

The objectives of this lab are to modify context-switching to handle corner cases when instrumenting XINU to monitor system performance, and implement dynamic scheduling.

2. Readings

Please use a fresh copy of XINU, xinu-spring2021.tar.gz, but for preserving the helloworld() function from lab1 and removing all code related to xsh from main(). Any code ported from lab2 are specified below.

3. Context-switching in XINU: creation and termination of processes [70 pts]

Corner cases are special cases that arise in context-switching of processes in an operating system that must be handled in a special manner for the system to function correctly. The two cases we will consider are context-switching in a new process and terminating (i.e., context-switching out and deleting) a running process.

3.1 Configuring the stack of a new process to work with ctxsw() [40 pts]

The discussion in the lectures of XINU's context-switching implementation centered on the assumption that the process to be context-switched in (i.e., "new" process in ctxsw()) had executed in the past and then context-switched out. Restoring the CPU state of the new process from the information in its stack relied on the process's context having been saved when it was context-switched out. This assumption obviously does not hold for a newly created process that is being context-switched in for the very first time. As discussed in the lectures, for ctxsw() to work correctly for a newly created process XINU's create() system call sets up a newly created process's run-time stack to make it appear as if it had been context-switched out in the past. That is, XINU implants fake memory into a newly created process for the purpose of allowing ctxsw() to work correctly for a new process that is about to "take its first breath."

To determine what create() must do to spawn a new process with a false past so that it can be context-switched in by ctxsw() when resched() decides to execute the newly process, inspect the content of the stack of the NULL process after it has been context-switched out. To do so, create a process with higher priority than the NULL process which runs the code, void inspectnull(void), in system/inspectnull.c. inspectnull() outputs the content of the run-time stack of the NULL process (in hexadecimal format) starting at the top of the stack until the saved EBP which is the first information preserved stored by the NULL process when running ctxsw() as it is about to be context-switched out. Can we use these values to initialize the stack of a newly created process so that it works with ctxsw() when it is context-switched in? For each component output by inspectnull() explain your reasoning. If a different value should be used be used for a component, specify what value and why. Make sure to clearly explicate to whom ctxsw() returns to in a newly created process and how this complies with CDECL. In the non-corner case where a process was context-switched out in the past, ctxsw() would return to resched() since resched() is the only kernel function in XINU that calls ctxsw(). Put your answer in lab3ans.pdf under lab3/.

3.2 What transpires when a running process terminates [30 pts]

The second corner case we will consider is when a process runs and executes its last instruction, by default, ret translated by gcc from C statement return. For example, if main is specified as the first argument of create() and the code of main() returns by executing ret, the question is where does main() return to. In XINU, this is the kernel function userret() which calls kill() to effect termination of the current process. To reduce overhead of calling kill(), modify userret() so that it performs the chores needed to terminate the current process. You must include the four C statements from send() to freestk() in kill() which inform the parent that a child has terminated by sending a message, closes three device descriptors that point to CONSOLE, and frees up the stack memory occupied by the current process. Message communication, memory and device management will be discussed later in the course. All other code of userret() must be minimal in the sense of being necessary for correct functioning of userret(). Although kill() is a more general system call where one process can terminate another whose state can be ready, suspended, sleeping, etc., userret() is optimized to deal with termination of the current process after it has completed its last instruction.

Test and verify that the modified userret() works correctly. Describe in lab3ans.pdf what happens when userret() completes its work. userret() calls resched() so that the scheduler can pick a different process to run next. resched() performs return after calling ctxsw(). Does resched() not return to userret()? Explain. Unlike in Hollywood movies where the dead return without scientific basis, in operating systems it is incumbent upon us to ensure that this doesn't happen.

4. Monitoring process CPU usage [60 pts]

We will reuse the 1 msec tick counter from Problem 4, lab2, that uses global variable, uint32 xclockticks, and system call clkticks(), to support monitoring of process CPU usage. Define a new process table field, uint32 prcputicks, that is initialized to 0 in create() to keep track of the CPU time (in unit of ticks) consumed by a process. The general idea is to start a tick timer when a process becomes current and is context-switched in. We stop the timer when a process is context-switched out. The time spent by a process in current state (i.e., PR_CURR) we add to prcputicks. A new system call, syscall totcputicks(pid32 pid), in system/totcputicks.c returns the prcputicks value of the specified process, or SYSERR if the system call fails.

To measure CPU usage when a process becomes current, we will introduce a global variable, uint32 currentticks, to be declared in initialize.c. Inside resched(), just before calling ctxsw(), two tasks are performed:

(i) Update old process's CPU usage. We update CPU usage of the "old" (i.e., process being context-switched out) by subtracting xprstarttick (see (ii) for its definition) from xclockticks. If the difference is 0, i.e., the process has consumed less than 1 msec of CPU time, we will set the difference to 1. That is, we are rounding up when CPU consumption is less than 1 tick. Conversely, by taking the difference we are rounding down when CPU usage is greater than 1 tick. This difference is added to the process's prcputicks process table field.

(ii) Start new process's CPU usage. We start monitoring CPU usage of the "new" process (i.e., process being context-switched in) by copying the value of xclockticks into global variable, uint32 xprstarttick, to be declared in initialize.c. Hence xprstarttick records the time (in unit of ticks) when the new process becomes current.

Note that for the NULL process that is custom created, special accommodation must be made to start its CPU usage measurement. Make sure this special case is not neglected, and specify in lab3ans.pdf your solution. Introduce a new system call, uint32 xcpuutil(void), in xcpuutil.c that returns the utilization of the Galileo's x86 CPU. To do so, subtract from xclockticks the NULL process's prcputicks, then divide this value by xclockticks. Finally, multiply by 100 to get a value between 0 and 100 which specifies the percentage that the CPU has spent running processes other than the NULL process.

Perform tests to verify that your kernel modifications are working correctly. Describe in lab3ans.pdf your method for doing so. The TAs will check your write-up to gauge soundness of your approach. They will carry out their own tests to determine if your CPU usage monitoring implementation works correctly.

5. Dynamic priority scheduling [170 pts]

5.1 Objective of operating system process scheduling

The default XINU scheduler is a static (or fixed) priority scheduler that uses priorities specified in the create() system call and never changes them. XINU's process scheduler, resched(), when invoked, picks a highest priority ready process to run next. If there is a tie, round-robin scheduling is implemented among the group of highest priority processes. In computing systems where multiple processes share CPU resources, "fair" allocation of CPU cycles is the default goal. The notion of fairness can be given different technical meanings. For example, in equal share fairness if there are N tasks needing to be scheduled on a shared resource such as a CPU, every task is expected to receive 1/N, i.e., equal share of CPU time. This notion of fairness is used in several resource allocation settings including scheduling of messages, called packets, on routers which are devices that packets pass through on their journey from source to destination. Traditionally fair scheduling in the equal share sense has not been used in operating system process scheduling due to overhead and difficulties associated with its implementation. The exception is today's Linux scheduler which was revamped several years ago to follow the spirit of fair scheduling. Whereas UNIX and Windows scheduler provide constant (i.e., O(1)) scheduling overhead, the Linux scheduler, called CFS (completely fair scheduler), incurs logarithmic overhead as a function of the number of processes in the system. The prior Linux scheduler, similar to UNIX and Windows, was a constant overhead scheduler.

The main issue with applying fair scheduling to operating system scheduling is that processes exhibit diverse traits such as many are short-lived whereas a few are long-lived. Enforcing meaningful equal share fairness of CPU usage when processes are not created at the same time and have different CPU demands add complications that detract from the simple idea of providing equal share of CPU cycles. Chief among process traits is that some app processes tend to hog the CPU, meaning that they do little I/O. These processes are called CPU-bound processes. Other processes perform a lot of I/O which often leads to blocking system calls (e.g., read()) that context-switch out the process until it becomes unblocked at a future time (e.g., a message that read() is blocking on has arrived). These processes, called I/O-bound processes, by their nature tend to require less CPU time than CPU-bound processes. Strictly enforcing equal share (e.g., a blocked process is not context-switched out but keeps occupying a CPU thus wasting cycles) leads to inefficient resource utilization. An application may utilize a technique called polling to trade timeliness for resource wastage. In the above example, instead of making a blocking read() system call a nonblocking read() call is made that returns immediately if a message is not available. Hence the process continues to run on the CPU which will benefit the app process if a message were to arrive very soon. In general, due to resource wastage polling is used in special circumstances only. Since CPU usage fairness is not a sufficient criterion that matches the demands of CPU- and I/O-bound process with different CPU needs, fair scheduling in its simple and elegant form is not considered adequate for CPU scheduling in operating systems.

5.2 Differentiated treatment of CPU- vs. I/O-bound processes

Process scheduling in operating systems has been dominated by the need to provide differentiated treatment of CPU- and I/O-bound processes, recognizing that the latter intrinsically require significantly less CPU time. Scheduling aims to "compensate" I/O-bound processes for being less CPU hungry by assigning them higher priority over CPU-bound processes in order to enhance timeliness. This improves responsiveness of I/O-bound processes such as web browsers and games that are interactive and engage in frequent I/O. The specific scheduling algorithms employed by UNIX, Windows, and pre-2007 Linux operating systems that reward I/O-bound processes with reduced response time for consuming less CPU cycles is varied. In this problem, we will follow the principle of differentiating CPU- and I/O-bound processes, and implement a simplified version of the UNIX Solaris scheduler. The simplification reduces the number of priority levels from 60 to 10, and uses only two time slice values. Note that the default Solaris scheduler discussed in class utilized six quantum values. For backward compatibility with XINU's legacy scheduler and reduce coding, we will continue to use its readylist which implements priority queueing. Thus we choose not to exploit constant overhead scheduling of Solaris that requires implementing ten priority queues instead of one.

5.3 XINU TS scheduler

When implementing dynamic priority scheduling in XINU that follows the design of UNIX Solaris TS (time share) scheduler, a number of files containing code will need to be modified. Specify in a text file README.1 which files you have modified, with a brief synopsis (1-2 lines) of for what purpose. Place README.1 under lab3/. Although the amount of coding is kept low, what and where coding changes are made is important to track for clarity and checking correctness.

Priority range. XINU's TS scheduler, XTS, supports 10 + 1 priority levels: 1, 2, ..., 10 for regular processes, and 0 for the NULL process. Define macros MINPROCPRIO as 1 and MAXPROCPRIO as 10 in include/process.h. Use MINPROCPRIO and MAXPROCPRIO instead of constants 1 and 10 in your code. The NULL process is special in that its priority never changes. This ensures that the XINU's NULL process will only run if there are no other ready processes in the system.

Time slice range. We will use two time slice values: 20 msec for processes with priorities 6 and higher which will correspond to I/O-bound processes, 80 msec for others including the NULL process which will correspond to CPU-bound processes. Define these as constants QUANTUMIO and QUANTUMCPU, respectively, in process.h.

Priority of newly created processes. When a process is created using the create() system call, the third argument specifying the requested priority is ignored. Instead, a newly created process is assigned priority 6 which is to be defined as INITPROCPRIO in process.h. Hence a newly created process may find itself somewhere in the middle of the pack.

Classification of CPU- and I/O-bound processes. Instead of elaborate methods that may incur significant overhead, operating systems favor simple low-overhead methods for classifying processes as CPU- or I/O-bound. A popular method looks at the recent behavior of a process to gauge whether it is CPU- or I/O-bound: if a process makes a blocking system call and is context-switched out by the scheduler it is considered as exhibiting I/O-bound behavior. If a process depletes its time slice which leads to the scheduler being invoked, it is viewed as exhibiting CPU-bound behavior. Just because the scheduler is invoked does not imply that the process who depleted its quantum is context-switched out. If its priority is strictly greater than other ready processes, it is given a fresh time slice and continues running. Time slice depletion is detected by clkhandler() in the lower half of the kernel. To reduce coding, we will use the blocking system call sleepms() as the representative system call of all other blocking system calls. Hence when a process calls sleepms() we will consider it as exhibiting I/O-bound behavior. Although we label processes "I/O-bound," whether a process actually engages in I/O is not the deciding factor. Instead, whether a process voluntarily reqlinquishes the CPU before depleting its time slice is.

Dynamic scheduling: priority promotion and demotion. When a process's last action that leads to invocation of XINU's scheduler resched() is time slice depletion, we will decrease its priority by 1 with a lower bound of MINPROCPRIO. If a process's last action was calling sleepms(), we will increase its priority by 1 upper bounded by MAXPROCPRIO. Hence a process that repeatedly calls sleepms() after becoming current will have its priority promoted. Conversely for processes that repeatedly deplete their time slice. It is the cumulative action of a process that determines whether the kernel will treat it as I/O-bound (high priority, small quantum) or CPU-bound (low priority, large quantum). A hybrid process may toggle back and forth in the mid-priority range. In some applications, a process that behaved as I/O-bound may change its stripes and turn into a CPU-bound process, or vice versa. In the former, the small time slice assigned to I/O-bound processes prevents an I/O-bound process that morphes into a CPU-bound process from starving other processes.

5.4 Monitoring process behavior

Introduce two new process table fields, uint32 prnumvol, and, uint32 prnumdepl, which are initialized to 0 when a process is created. We will use the two fields to monitor and tally a process's overall tendency to behave in a CPU- or I/O-bound manner. We will increment prnumvol each time a process calls sleepms() to voluntarily relinquish the CPU. Increment prnumdepl each time a process depletes its time slice. By inspecting the two fields, we can assess a process's conduct over its lifetime.

5.5 Evaluation of XTS

Test applications. Code two test applications, void cpubound(void), that serves as a CPU-bound app and a program, void iobound(void), that acts as an I/O-bound app. The code of cpubound() in system/cpubound.c works as follows:

int i = 0, j;
while(xclockticks < 6000) // loop until wall clock time of 6 sec is reached
j = getpid();
kprintf("cpubound: %d %d %d %d\n", j, totcputicks(j), proctab[j].prnumdepl, proctab[j].prnumvol);

Since xclockticks tracks system uptime in unit of millisecond, cpubound() keeps performing ALU operations until 6 seconds have elapsed at which time it prints its PID, CPU usage, number of time slice depletions, and number of calls to sleepms().

The I/O-bound application, iobound() in system/iobound.c, works as follows:

int i = 0, j;
while(xclockticks < 6000) { // until wall clock time of 6 sec is reached do
   for(j=0; j < CPUUPPER; j++) ; // consume some more CPU cycles before blocking
j = getpid();
kprintf("iobound: %d %d %d %d\n", j, totcputicks(j), proctab[j].prnumdepl, proctab[j].prnumvol);

Sleep interval IOSLEEP and CPUUPPER both to be defined in iobound.c will affect the degree to which iobound() will behave in an I/O-bound manner. Set the default value of IOSLEEP to 300 and CPUUPPER to 200000. We expect CPU usage of iobound() to be significantly smaller than cpubound(), and its cumulative behavioral mode reflected in prnumvol and prnumdepl. You may vary IOSLEEP and CPUUPPER to affect the degree to which the I/O-bound process blocks, i.e., how I/O-bound the app is. If you deviate from the default, explain in lab3ans.pdf why.

Test scenario A. Create 4 CPU-bound processes from main() back-to-back. Assign the process running main() high priority so that it creates and readies all processes before terminating. To do so, call chprio() from the process running main() where it sets its priority to MAXPROCPRIO which will suffice to create all test processes and ready them before terminating. Note that this is a hack for testing purposes, and in a kernel implementing isolation/protection such wanton change of process priority must be prevented. If your scheduler is implemented correctly, we would expect to see the 4 processes printing similar CPU usage, prnumdepl, and prnumvol values. Discuss your findings in lab3ans.pdf.

Test scenario B. Create 4 I/O-bound processes from main() and perform the same benchmark tests as scenario A. The four processes must run the same code with the same parameters IOSLEEP and CPUUPPER. Discuss your findings in lab3ans.pdf.

Test scenario C. Create 2 CPU-bound processes and 2 I/O-bound processes. We would expect the 2 CPU-bound processes to output similar values with respect to each other, and the same goes for the 2 I/O-bound processes with respect to each other. Across the two groups, we would expect CPU-bound processes to receive significantly more CPU time than I/O-bound processes. The values of prnumdepl and prnumvol should also reflect the different behavioral modes. Discuss your findings in lab3ans.pdf.

Test scenario D. In the previous test scenarios all test processes were created at about the same time which serves a useful purpose of evaluating scheduling performance. In practice, processes are created over time which can inject significant complications for fair scheduling. For example, a process that was created 15 minutes in the past and has accumulated significant CPU usage will not be able to compete with newly created processes that have zero CPU usage. Until newly created processes "catch up" with the old process and rack up significant CPU usage, the old process will be starved which is unacceptable in operating systems. For this and other reasons, Linux CFS has to introduce ad hoc mechanisms that diminish the simplicity and elegance of fair scheduling.

In test scenario D, make the process running main() spawn two CPU-bound child processes, then sleep for 3 seconds. After waking up, make the process spawn two additional CPU-bound processes. To do so, the process's priority running main() may need to be set to 12 so that it is guaranteed to have highest priority when it wakes up. Perform a thought experiment, i.e., simulate in your mind what approximate fraction of CPU usage you would expect the first two processes to receive, and similarly for the second group of two CPU-bound processes. Compare the observed output against your predicted results. Discuss your findings in lab3ans.pdf.

Bonus problem [25 pts]

Although the principles and methods implemented by XTS follow those of modern operating systems, the latter (and XTS) cannot guarantee prolonged starvation of a ready process. For example, if a barrage of I/O-bound processes are continually created then they may, as a group, starve CPU-bound processes. That is, even though an individual I/O-bound process blocks frequently and does not consume many CPU cycles, if there are many of them then collectively they may starve CPU-bound processes. Create a test scenario E where CPU-bound processes experience receive significantly less CPU cycles (near starvation) as a result of this crowding effect of I/O-bound processes. Run your test scenario and discuss your findings. Propose a mechanism in XTS (there is no need to implement it) that would mitigate starvation. Discuss the overhead of your proposed starvation prevention mechanism.

Note: The bonus problem provides an opportunity to earn extra credits that count toward the lab component of the course. It is purely optional.

Turn-in instructions

General instructions:

When implementing code in the labs, please maintain separate versions/copies of code so that mistakes such as unintentional overwriting or deletion of code is prevented. This is in addition to the efficiency that such organization provides. You may use any number of version control systems such as GIT and RCS. Please make sure that your code is protected from public access. For example, when using GIT, use git that manages code locally instead of its on-line counterpart github. If you prefer not to use version control tools, you may just use manual copy to keep track of different versions required for development and testing. More vigilance and discipline may be required when doing so.

The TAs, when evaluating your code, will use their own test code (principally main()) to drive your XINU code. The code you put inside main() is for your own testing and will, in general, not be considered during evaluation.

If you are unsure what you need to submit in what format, consult the TA notes link. If it doesn't answer your question, ask during PSOs and office hours which are scheduled M-F.

Specific instructions:

1. Format for submitting written lab answers and kprintf() added for testing and debugging purposes in kernel code:

2. Before submitting your work, make sure to double-check the TA Notes to ensure that any additional requirements and instructions have been followed.

3. Electronic turn-in instructions:

        i) Go to the xinu-spring2021/compile directory and run "make clean".

        ii) Go to the directory where lab3 (containing xinu-spring2021/ and lab3ans.pdf) is a subdirectory.

                For example, if /homes/alice/cs354/lab3/xinu-spring2021 is your directory structure, go to /homes/alice/cs354

        iii) Type the following command

                turnin -c cs354 -p lab3 lab3

You can check/list the submitted files using

turnin -c cs354 -p lab3 -v

Please make sure to disable all debugging output before submitting your code.

Back to the CS 354 web page