System Architecture
(INFT12-212 and 72-212)
Lab Notes for Week 8: Processes, Process Syscalls and
Process Scheduling
1 Introduction
For the second half of the subject, we are going to look at operating
system ideas and system calls in the lab sessions. Generally, we will
do this:
- look at Unix and Linux specific concepts, such as processes, files,
memory etc.
- look at the POSIX system calls which give user-mode programs access
to these concepts
- sometimes, look at some Java simulations of how an operating system
implements these concepts
This week, we are looking at the concept of processes.
2 The Unix Process Model
Unix has a family-oriented process model: only an existing process
can create a new process on the system. Once created, the pair are
known as the parent and child process. From there,
child processes can make their own children. Thus, you end up with
a family tree of related processes.
Here is an example family tree, created by the Linux tool pstree:
init-+-Xvnc
|-cron---cron-+-sendmail
| `-sh---cvsup
|-2*[getty]
|-inetd
|-lpd
|-sendmail
|-sshd
|-syslogd
|-xinit-+-XF86_Mach64
| `-fvwm-+-FvwmPager
| |-xbiff
| |-xclock
| |-xterm---tcsh-+-navigator-4.61.b---navigator-4.61.b
| | `-pstree
| `-xterm---tcsh
|-xntpd
`-xterm---ssh1
All processes have init as their ancestor. Init
is started when Unix boots, and is responsible for getting the system
going, restarting the getty login daemons, and waiting for
zombie processes (more on those later).
3 Multiprocess Capabilities
As mentioned before, Unix is a multiuser system and a multiprocess
system. In other words, Unix can appear to be running multiple programs
at the same time. Login on your Linux box, and then use SSH to connect
to another box, e.g. box14.
Once you are connected to box14, try this command out:
$ w
This shows much more information, such as the current time, the time
since reboot, the system load, how long users have been idle at their
terminal, and which process is attached to their terminal.
The load average is actually the average size of the Ready
to Run queue on the system. Three values are given: an average for
the last 1, 5 and 15 minutes. A load average of 1.0 says that 1.0
processes are ready to run, on average. A load of 3.5 indicates that
3.5 processes (on average) are competing for the CPU, and so the machine
is fairly loaded.
Loads above 5.0 are considered to be very high: on average, you programs
will take five times as long to run. A load below 1.0 shows that the
system is mostly idle: not many processes on the system are ready
to run.
4 Multitasking Capabilities
Unix also provides multitasking capabilities. You interact with Unix
via the shell, which reads the command lines that you type
and starts processes running to perform the command.
Unlike other systems, the Unix shell is just an ordinary process which
reads command lines and executes the commands. Anybody can write their
own shell. Historically, there have been several Unix shells, each
with their own features and functionality. If you have a dollar-sign
prompt, you are probably running the Korn shell (or perhaps the Bourne
or Bash shell). If you have a percent-sign prompt, you are probably
running the C-Shell (or perhaps the TC-Shell).
Therefore, for every command window you have open, there is at least
one process running on the server. You can easily start up another
command window t and get a second shell process. Let's look at some
commands to show these processes. The ps command shows information
about the processes running on the system.
$ ps
You should see some information about your shell. We need to give
some options to ps to get more information. Try
$ ps -ef | more
The output of ps is being piped to the more process
because there are screenfulls of output. The options to ps
get it to display all processes (-e), and give extra information
(-f):
- The user who started the process,
- The unique process-id of the process,
- The process-id of the parent process who started this process
up,
- The processor utilisation (obsolete),
- The time or date when the process started,
- The terminal to which this process is attached,
- The cumulative execution time for the process, and
- The command name and the arguments.
Try and find the entry for the running ps and more
commands, and see if they have a parent process. Now, using the parent
process-id, find which process(es) spawned the ps and more.
Also, think of a reason why many processes don't have a terminal.
Replace the more command with wc above to count
the number of processes on the system. Then get the current load average,
and work out what percentage of processes on the system are ready
to run; all the other processes are idle. This should help
to show how a single CPU can support such a high level of multitasking.
The ps command also allows you to choose what columns of
information should be displayed. For example, the command:
$ ps -e -o pri,vsz,pcpu,s,comm | more
shows the priority of each process, its virtual size, recent CPU usage,
state and its name.
Read the manual page on ps and work out how to use the -o
options to do other things. Also find out how to show the processes
for a particular user only.
Process state letters shown by ps are:
- O:
- The process is running.
- S:
- The process is blocked on I/O.
- R:
- The process is ready to run.
- I:
- The process is idle (blocked for > 30 seconds).
- Z:
- The process is a zombie.
- T:
- The process is being traced (debugged) by its parent.
- X:
- The process is waiting for more memory.
Which Unix process states correspond to the Running, Ready to Run,
and Blocked states that we saw in lectures? Which ones don't?
5 Foreground and Background Tasks
Why don't some processes require terminals? Well, imagine if you wanted
to run a Web server, but could not log out because the terminal would
be disconnected and the Web server would stop.
Unix has the concept of foreground processes which are attached
to a user's terminal, and can read from the terminal, and background
processes, which have no terminal to read from. The latter are obviously
useful for server-type operations, and the former for normal user-operations
(the shell, the editor etc.).
Recent versions of Unix also allow a process to be placed into a stopped
state, which is like the blocked state, but the process is waiting
on a `wakeup' signal from the user. The following state diagram shows
how you can move a process between these states:
Here are some examples:
$ sleep 100 # Wait for 100 seconds, then go back to shell
ctrl-C # Kill sleep process
$ sleep 100 & # Wait for 100 seconds in the background
$ jobs # Show list of jobs
[1] + Running sleep 100 &
$ stop %1 # Stop sleep process (job number 1)
$ jobs
[1] + Stopped (SIGSTOP) sleep 100 &
$ bg %1 # Put sleep in background
$ fg %1 # Bring sleep in foreground
If you had a Web server called webserv, what would be the
correct way to start this process up?
Try the command:
$ (sleep 10; ps -ef)
Explain what it does. Now run it in the background. What is different?
Run the command in the background again, and see how many long listings
(ls -l) you can do before you see the output from ps.
6 Process-related System Calls
6.1 Finding a Process' Identity
#include <sys/types.h>
#include <unistd.h>
int getpid(void)
int getppid(void)
getpid()
and getppid()
are the two system calls used to get the identity of a process. Getpid()
returns to a process its own unique process id. Getppid()
returns the process id of its parent. Neither system calls take any
arguments.
Question: Where are processes numbers recorded by the operating system?
6.2 The Fork System Call
#include <sys/types.h>
#include <unistd.h>
int fork(void)
Fork()
is the system call that creates a child process. It takes no arguments,
so you can't control what child is created. In fact, the child is
virtually identical to the parent process. The only difference
is they have different process ids! The fork() returns
0 to the child, or the child's process id to the parent, or -1 to
the parent if the fork() could not be done.
Question: What use is such a system call? You can't use it to run
another program, just an exact copy of the program you're currently
running.
We need to learn about the exec()
system call to see the underlying design.
6.3 The Exec System Call
#include <unistd.h>
int execlp(const char *file, const char *arg, ...)
There are several exec()
variants. Execlp() takes a file's name, and a list of command-line
arguments. When this system call is done, the current process is replaced
by loading the machine code from the named file. There is no new process:
the existing process undergoes a Dr. Jekyll/Mr. Hyde transformation.
For example, the code:
. . .
execlp("ls", "-l", "fred.c", mary.c", NULL);
. . .
printf("Just after the exec syscall\n");
would perform the command ls with the arguments -l
fred.c mary.c. If successful, the current process would be replaced
by the ls command, which would then start up as per usual.
In fact, the printf() operation would never happen!
Question: Why would the printf() operation never happen?
Execlp() returns a result, which always indicates a failure
to run the named command.
Question: Execlp() seems just as useless as fork():
we have to destroy one process just to run a different program. Why
two useless system calls?
6.4 Combining Fork and Exec
It's the combination of fork() and execlp() that
gives us the desired process creation functionality, with some extra
goodies. Imagine the common shell command ls -l > outfile,
to save the output of ls -l into outfile.
From the syntax, it appears that ls must understand the >
symbol, and internally open the outfile and send its output
there. It follows on that every Unix command would have to
also have this functionality. What a lot of code duplication!
In fact, we can use the fork()/execlp() system calls
within the shell, and place the file redirection functionality
in the shell. Here's how we could implement redirection in one place
only:
int pid; /* Holds the process-id of the child */
switch (pid = fork()) /* Call fork, and test what it returns */
{
case -1: printf("The fork failed\n"); return (-1);
case 0: /* The child is given a value of 0 from fork() */
/* Do special things like closing files etc. */
if (redirection required) {
close my standard output;
open the named output file as my standard output;
}
/* Replace ourselves with the real child */
/* Call execvp */
execlp(command, arg1, arg2, ..., NULL);
exit(1); /* Exec failed, indicate error */
default: /* The parent receives the new process-id from fork() */
wait(); /* Wait for child to exit */
}
After the fork(), we have two nearly identical processes.
The child identifies itself as such, and does any file redirection
work before it calls with execlp() to replace itself
with the real command to run. Thus the power of fork()/execlp().
Now, no Unix command has to deal with file redirection: the shell
does the work before any commands gets to start running.
A comment is required here. If fork() is nearly always followed
by an execlp(), then the OS must make a copy of the current
process (and duplicate all of its memory resources), and then overwrites
them nearly immediately. This seems to be a wasteful operation.
In modern Unixes, the solution is to use copy-on-write page
protections, which we will cover when we get to memory management.
6.5 The Exit and Wait System Calls
#include <stdlib.h>
void exit(int status)
Every process calls exit()
when they finish, to inform the parent that it has terminated, and
how. A status of 0 means no errors occurred. Any other value indicates
some error.
#include <sys/types.h>
#include <sys/wait.h>
int wait(int *status)
If the parent process wait()s
after the fork(), then it will be blocked until
the child dies. At that point, it will be made ready to run and the
status variable will hold the child's exit status. This is
how the shell finds out if the command you are running exits normally
or abnormally.
Here is a timeline of a normal Unix command execution with file redirection.
6.6 Zombie Processes
Imagine a parent process terminates, normally or abnormally. What
will happen when one of its children tries to terminate and report
its exit status back using exit()? There is no parent to
report to. In this situation, the child cannot fully terminate until
the exit() has completed: it is a zombie process.
In this situation, the exit status is sent to the init process.
Periodically, init gathers the exit values from any zombie
processes, and they disappear off the system.
6.7 Some Example Programs
Here are some example programs which show the use of fork(),
exec(), wait() and exit(). Download each
one, read through it, compile it and run it.
- onlyfork.c show what happens when a process
fork()s with no exec(). Compile with the command
cc -o onlyfork -g -Wall onlyfork.c , and run it with
the command ./onlyfork
- onlyexec.c show what happens when a process
exec()s with no fork(). Compile with the command
cc -o onlyexec -g -Wall onlyexec.c , and run it with
the command ./onlyexec
- forkexec.c show what happens when a process
fork()s and exec()s. Compile with the command
cc -o forkexec -g -Wall forkexec.c , and run it with
the command ./forkexec
- shell.c is an example of a Unix command-line
shell, which reads a command line, breaks it into the command plus
the arguments and then uses fork() and exec() to run it. Don't worry
about the code which parses the line, concentrate on the code which
uses fork(), exec(), wait() and exit(). Use
the same sort of compile command as above.
- I wrote this program, tinyshell.c, to
see how small I could make a shell. Compile this one with the command
cc -o tinyshell tinyshell.c
7 The Linux Scheduler
For those Linux heads, here are some good articles about the Linux
process scheduler:
8 Process Scheduling Simulator
On the Modern Operating System Simulators
web page, there is a scheduling simulator for Linux. Install the program
on your Debian box in the Linux Lab. This should be as simple as:
- download the simulator tarball from this local copy
- tar vxzf scheduling.tgz
- cd scheduling
Once you have the scheduing simulator installed and working, read
the user manual. The simulator
simulates first-come first-serve process scheduling with no pre-emption,
and processes have average run times before blocking, with a specific
amount of time blocked when they do block.
8.1 Activities
- Look at the original scheduling.conf file: explain what it means.
Explain how this scheduling.conf file produces this output:
Scheduling Type: Batch (Nonpreemptive)
Scheduling Name: First-Come First-Served
Simulation Run Time: 2877
Mean: 1100
Standard Deviation: 510
Process # | CPU Time | IO Blocking | CPU Completed | CPU Blocked |
|
0 | 1725 (ms) | 100 (ms) | 1725 (ms) | 17 times |
1 | 576 (ms) | 500 (ms) | 576 (ms) | 1 times |
2 | 576 (ms) | 30 (ms) | 576 (ms) | 19 times |
- Create a configuration file in which all processes run an average
of 2000 milliseconds with a standard deviation of zero, and which
are blocked for input or output every 500 milliseconds. Run the simulation
for 10000 milliseconds with 2 processes. Examine the two output files.
Try again for 5 processes. Try again for 10 processes. Explain what's
happening.
- For the Java programmers, modify the scheduling algorithm to perform
pre-emption. Add a configuration directive that sets the maximum quantum
that any process can have before being pre-empted. Put the process
back on the tail of the ready queue.
9 Outlook for the Next Lab
In the next lab, we will look at memory management.
File translated from
TEX
by
TTH,
version 3.85.
On 13 Jan 2012, 10:33.