System Architecture

(INFT12-212 and 72-212)
Lab Notes for Week 8: Processes, Process Syscalls and Process Scheduling

1  Introduction

For the second half of the subject, we are going to look at operating system ideas and system calls in the lab sessions. Generally, we will do this: This week, we are looking at the concept of processes.

2  The Unix Process Model

Unix has a family-oriented process model: only an existing process can create a new process on the system. Once created, the pair are known as the parent and child process. From there, child processes can make their own children. Thus, you end up with a family tree of related processes.
Here is an example family tree, created by the Linux tool pstree:
init-+-Xvnc
     |-cron---cron-+-sendmail
     |             `-sh---cvsup
     |-2*[getty]
     |-inetd
     |-lpd
     |-sendmail
     |-sshd
     |-syslogd
     |-xinit-+-XF86_Mach64
     |       `-fvwm-+-FvwmPager
     |              |-xbiff
     |              |-xclock
     |              |-xterm---tcsh-+-navigator-4.61.b---navigator-4.61.b
     |              |              `-pstree
     |              `-xterm---tcsh
     |-xntpd
     `-xterm---ssh1

All processes have init as their ancestor. Init is started when Unix boots, and is responsible for getting the system going, restarting the getty login daemons, and waiting for zombie processes (more on those later).

3  Multiprocess Capabilities

As mentioned before, Unix is a multiuser system and a multiprocess system. In other words, Unix can appear to be running multiple programs at the same time. Login on your Linux box, and then use SSH to connect to another box, e.g. box14.
Once you are connected to box14, try this command out:
  $ w

This shows much more information, such as the current time, the time since reboot, the system load, how long users have been idle at their terminal, and which process is attached to their terminal.

The load average is actually the average size of the Ready to Run queue on the system. Three values are given: an average for the last 1, 5 and 15 minutes. A load average of 1.0 says that 1.0 processes are ready to run, on average. A load of 3.5 indicates that 3.5 processes (on average) are competing for the CPU, and so the machine is fairly loaded.
Loads above 5.0 are considered to be very high: on average, you programs will take five times as long to run. A load below 1.0 shows that the system is mostly idle: not many processes on the system are ready to run.

4  Multitasking Capabilities

Unix also provides multitasking capabilities. You interact with Unix via the shell, which reads the command lines that you type and starts processes running to perform the command.
Unlike other systems, the Unix shell is just an ordinary process which reads command lines and executes the commands. Anybody can write their own shell. Historically, there have been several Unix shells, each with their own features and functionality. If you have a dollar-sign prompt, you are probably running the Korn shell (or perhaps the Bourne or Bash shell). If you have a percent-sign prompt, you are probably running the C-Shell (or perhaps the TC-Shell).
Therefore, for every command window you have open, there is at least one process running on the server. You can easily start up another command window t and get a second shell process. Let's look at some commands to show these processes. The ps command shows information about the processes running on the system.
  $ ps

You should see some information about your shell. We need to give some options to ps to get more information. Try
  $ ps -ef | more

The output of ps is being piped to the more process because there are screenfulls of output. The options to ps get it to display all processes (-e), and give extra information (-f): Try and find the entry for the running ps and more commands, and see if they have a parent process. Now, using the parent process-id, find which process(es) spawned the ps and more. Also, think of a reason why many processes don't have a terminal.
Replace the more command with wc above to count the number of processes on the system. Then get the current load average, and work out what percentage of processes on the system are ready to run; all the other processes are idle. This should help to show how a single CPU can support such a high level of multitasking.
The ps command also allows you to choose what columns of information should be displayed. For example, the command:
  $ ps -e -o pri,vsz,pcpu,s,comm  | more

shows the priority of each process, its virtual size, recent CPU usage, state and its name.
Read the manual page on ps and work out how to use the -o options to do other things. Also find out how to show the processes for a particular user only.
Process state letters shown by ps are:
O:
The process is running.
S:
The process is blocked on I/O.
R:
The process is ready to run.
I:
The process is idle (blocked for > 30 seconds).
Z:
The process is a zombie.
T:
The process is being traced (debugged) by its parent.
X:
The process is waiting for more memory.
Which Unix process states correspond to the Running, Ready to Run, and Blocked states that we saw in lectures? Which ones don't?

5  Foreground and Background Tasks

Why don't some processes require terminals? Well, imagine if you wanted to run a Web server, but could not log out because the terminal would be disconnected and the Web server would stop.
Unix has the concept of foreground processes which are attached to a user's terminal, and can read from the terminal, and background processes, which have no terminal to read from. The latter are obviously useful for server-type operations, and the former for normal user-operations (the shell, the editor etc.).
Recent versions of Unix also allow a process to be placed into a stopped state, which is like the blocked state, but the process is waiting on a `wakeup' signal from the user. The following state diagram shows how you can move a process between these states:
Figs/foreback.gif
Here are some examples:
  $ sleep 100                 # Wait for 100 seconds, then go back to shell
    ctrl-C                    # Kill sleep process
  $ sleep 100 &               # Wait for 100 seconds in the background
  $ jobs                      # Show list of jobs
  [1] +  Running                 sleep 100 &
  $ stop %1                   # Stop sleep process (job number 1)
  $ jobs
  [1] + Stopped (SIGSTOP)        sleep 100 &
  $ bg %1                     # Put sleep in background
  $ fg %1                     # Bring sleep in foreground

If you had a Web server called webserv, what would be the correct way to start this process up?
Try the command:
  $ (sleep 10; ps -ef)

Explain what it does. Now run it in the background. What is different? Run the command in the background again, and see how many long listings (ls -l) you can do before you see the output from ps.

6  Process-related System Calls

6.1  Finding a Process' Identity

     #include <sys/types.h>
     #include <unistd.h>

     int getpid(void)

     int getppid(void)

getpid() and getppid() are the two system calls used to get the identity of a process. Getpid() returns to a process its own unique process id. Getppid() returns the process id of its parent. Neither system calls take any arguments.
Question: Where are processes numbers recorded by the operating system?

6.2  The Fork System Call

     #include <sys/types.h>
     #include <unistd.h>

     int fork(void)

Fork() is the system call that creates a child process. It takes no arguments, so you can't control what child is created. In fact, the child is virtually identical to the parent process. The only difference is they have different process ids! The fork() returns 0 to the child, or the child's process id to the parent, or -1 to the parent if the fork() could not be done.
Question: What use is such a system call? You can't use it to run another program, just an exact copy of the program you're currently running.
We need to learn about the exec() system call to see the underlying design.

6.3  The Exec System Call

     #include <unistd.h>

     int execlp(const char *file, const char *arg, ...)

There are several exec() variants. Execlp() takes a file's name, and a list of command-line arguments. When this system call is done, the current process is replaced by loading the machine code from the named file. There is no new process: the existing process undergoes a Dr. Jekyll/Mr. Hyde transformation. For example, the code:
    . . .
    execlp("ls", "-l", "fred.c", mary.c", NULL);
    . . .
    printf("Just after the exec syscall\n");

would perform the command ls with the arguments -l fred.c mary.c. If successful, the current process would be replaced by the ls command, which would then start up as per usual. In fact, the printf() operation would never happen!
Question: Why would the printf() operation never happen?
Execlp() returns a result, which always indicates a failure to run the named command.
Question: Execlp() seems just as useless as fork(): we have to destroy one process just to run a different program. Why two useless system calls?

6.4  Combining Fork and Exec

It's the combination of fork() and execlp() that gives us the desired process creation functionality, with some extra goodies. Imagine the common shell command ls -l > outfile, to save the output of ls -l into outfile.
From the syntax, it appears that ls must understand the > symbol, and internally open the outfile and send its output there. It follows on that every Unix command would have to also have this functionality. What a lot of code duplication!
In fact, we can use the fork()/execlp() system calls within the shell, and place the file redirection functionality in the shell. Here's how we could implement redirection in one place only:
  int pid;                      /* Holds the process-id of the child */

  switch (pid = fork())         /* Call fork, and test what it returns */
  {
    case -1: printf("The fork failed\n"); return (-1);

    case 0:              /* The child is given a value of 0 from fork() */
                           /* Do special things like closing files etc. */

    if (redirection required) {
      close my standard output;
      open the named output file as my standard output;
    }

                                /* Replace ourselves with the real child */
                                /* Call execvp */
      execlp(command, arg1, arg2, ..., NULL);
      exit(1);                  /* Exec failed, indicate error */

    default:          /* The parent receives the new process-id from fork() */
      wait();                   /* Wait for child to exit */
}

After the fork(), we have two nearly identical processes. The child identifies itself as such, and does any file redirection work before it calls with execlp() to replace itself with the real command to run. Thus the power of fork()/execlp().
Now, no Unix command has to deal with file redirection: the shell does the work before any commands gets to start running.
A comment is required here. If fork() is nearly always followed by an execlp(), then the OS must make a copy of the current process (and duplicate all of its memory resources), and then overwrites them nearly immediately. This seems to be a wasteful operation.
In modern Unixes, the solution is to use copy-on-write page protections, which we will cover when we get to memory management.

6.5  The Exit and Wait System Calls

     #include <stdlib.h>

     void exit(int status)

Every process calls exit() when they finish, to inform the parent that it has terminated, and how. A status of 0 means no errors occurred. Any other value indicates some error.
     #include <sys/types.h>
     #include <sys/wait.h>

     int wait(int *status)

If the parent process wait()s after the fork(), then it will be blocked until the child dies. At that point, it will be made ready to run and the status variable will hold the child's exit status. This is how the shell finds out if the command you are running exits normally or abnormally.
Here is a timeline of a normal Unix command execution with file redirection.
Figs/forkexec.gif

6.6  Zombie Processes

Imagine a parent process terminates, normally or abnormally. What will happen when one of its children tries to terminate and report its exit status back using exit()? There is no parent to report to. In this situation, the child cannot fully terminate until the exit() has completed: it is a zombie process.
In this situation, the exit status is sent to the init process. Periodically, init gathers the exit values from any zombie processes, and they disappear off the system.

6.7  Some Example Programs

Here are some example programs which show the use of fork(), exec(), wait() and exit(). Download each one, read through it, compile it and run it.

7  The Linux Scheduler

For those Linux heads, here are some good articles about the Linux process scheduler:

8  Process Scheduling Simulator

On the Modern Operating System Simulators web page, there is a scheduling simulator for Linux. Install the program on your Debian box in the Linux Lab. This should be as simple as: Once you have the scheduing simulator installed and working, read the user manual. The simulator simulates first-come first-serve process scheduling with no pre-emption, and processes have average run times before blocking, with a specific amount of time blocked when they do block.

8.1  Activities

  1. Look at the original scheduling.conf file: explain what it means. Explain how this scheduling.conf file produces this output:
    Scheduling Type: Batch (Nonpreemptive)
    Scheduling Name: First-Come First-Served
    Simulation Run Time: 2877
    Mean: 1100
    Standard Deviation: 510
    Process # CPU Time IO Blocking CPU Completed CPU Blocked
    0 1725 (ms) 100 (ms) 1725 (ms) 17 times
    1 576 (ms) 500 (ms) 576 (ms) 1 times
    2 576 (ms) 30 (ms) 576 (ms) 19 times
  2. Create a configuration file in which all processes run an average of 2000 milliseconds with a standard deviation of zero, and which are blocked for input or output every 500 milliseconds. Run the simulation for 10000 milliseconds with 2 processes. Examine the two output files. Try again for 5 processes. Try again for 10 processes. Explain what's happening.
  3. For the Java programmers, modify the scheduling algorithm to perform pre-emption. Add a configuration directive that sets the maximum quantum that any process can have before being pre-empted. Put the process back on the tail of the ready queue.

9  Outlook for the Next Lab

In the next lab, we will look at memory management.


File translated from TEX by TTH, version 3.85.
On 13 Jan 2012, 10:33.