Warren's 2012 Summary of the Systems Architecture Subject
 
1  Basics of Computer Architecture
Different levels: user mode, kernel mode, hardware. What is at each
level. What is the interface to each level.
Hardware: ISA is the interface Operating system: system call API is
the interface.
von Neumann architecture, the instruction cycle Internals of CPU:
ALU, registers, control logic. 
Busses connect memory and CPU, and also device controllers.
Before von Neumann, computers were hardwired. von Neumann means instructions
can be stored in memory.
Memory: seen as an array of bytes, each one with an address. However,
bus size allow read/writes in multiples of bytes.
Three busses: address, data bus, control/status bus. What does each
one do, what their sizes mean to the CPU design.
 
2  Instruction Set Design Design
Bus sizes.
Why # registers.
Why # operations and which ones.
# operands and literal values.
Size, how many instruction format(s).
Main differences between CISC and RISC: note that RISC != fewer instructions.
Basic MIPS ISA: # regs, instruction formats & the fields in each
one, literals, basic set of instructions
 
3  Addressing Modes & Control Logic
RISC is generally a load/store architecture. CISC instructions can
work directly on RAM.
Addressing modes make things easier/harder for the programmer/compiler
writer. Each one helps to implement certain things: branches for loops/IFs,
jumps for function calls, base/index/offset addressing for structure
like arrays, objects, linked lists.
How the CPU works: some components are sequential (memory parts) ,
i.e. stateful, some components are combinatorial (ALU).
Instruction phases: fetch, decode, execute etc. PC controls where
to get the next instruction. IR holds the fetched instruction which
is decoded by the control logic.
Multiplexers take multiple inputs, only give 1 output, a control line
chooses which input.
Basic difference between hardwired and microcoded control logic. RISC
is generally hardwired, CISC generally microcoded.
 
4  Control/Flow of Execution, 
Branches used to implement IF and all the loops. Know how to translate
high level IF/FOR/WHILE/SWITCH into assembly code.
Function calls: know address of 1st instruction in the function. Need
to store our regs somewhere. Need to store return address somewhere.
Therefore, need a stack to hold this, especially for multiple function
calls (think recursion). 
The structure is known as a stack frame. Know the basic structure
of a stack frame: return address, stored regs, local vars, args to
next function.
Know basic memory layout of a process: code at bottom, global data
above code (can grow), stack at top (can grow).
Different ISA have different instructions to do function calls/returns.
Some (e.g. MIPS) jump you to the top of a function, you have to build
frame yourself. Other CPUs (e.g. VAX) build the frame and do everything.
 
5  User and Kernel Mode, System Calls, I/O, Exceptions
User/Kernel Mode: why both, why do we need at least 2? On some arch.
>2 modes (e.g. Intel has 4 rings).
What does each permit/deny. How to switch from each mode to the other:
syscalls, exceptions, interrupts. Special instruction to get back
to umode from kmode. Know how syscalls work.
I/O devices: mapped on the address bus, so appear as memory location.
Each device has a controller with own logic and registers. CPU read/writes
registers to get/send data, and also get/sent status/commands. 
Know basic idea of how to send cmds to a device to get it to do something,
how it will report status back in a status register.
Polling the status register: waste of CPU (busy waiting), so get device
to send interrupt to the CPU.
Know what is interrupt, how they work, how CPU gets sent to interrupt
handler, how it gets back to what it was doing, priority levels for
the interrupts, and what devices get highest priority.
Know how interrupts work, what causes them and why. OS can field the
exception and decide what to do about it. Attempt to do kmode stuff
from umode, page fault.
How we can use syscalls/interrupts/exceptions to make a virtual computer.
 
6  CPU Memory Management, Context Switching
Memory management at the hardware level.
Idea of pages: fixed sized lumps which are mapped to fixed size frame
of physical memory. 
Idea of address space: each process sees a different reality of the
layout of memory. Page mappings, offsets on each page that map to
the same offset of the frame.
Permissions on each page: what are they, what are they used for.
Context switching: how is it achieved, we get regular interrupts in
from some clock device (ticks), typically 100 or 1000 time a second.
On each tick, choose new process, save old P's registers and memory
map into the PCB, and reloads map & register for new one from its
PCB. Bigger map for a process => slower context switches.
 
7  Introduction to Operating Systems
Main tasks for an OS: hide the hardware details from the user mode
programs, provide a substitute in the way of abstract services, manage
the hardware resources of the computer.
Uses kmode and umode to do all this: stop umode programs from getting
access to all the resources.
The interface is the set of system calls the OS provides: its API.
Types of OS: mainframe, server, desktop, real-time, embedded. Know
the basic characteristics of each.
Basic abstractions that OS provides: process, logical memory, files,
windows, networking.
The syscall API is the interface: can build several OS that provide
the same API.
Different internal architectures: monolithic, client-server, microkernels.
Know the basic characteristics of each.
Understand the difference between the top-half and bottom-half of
a monolithic kernel.
 
8  Processes
Process: a sequence of computer instructions executing within a restricted
environment, i.e. logical address space with registers and access
to syscalls.
We call the environment the context of the process.
Know the logical memory layout of a process.
Know the main state diagrams for the different process models: embedded,
run to completion, run to block, run to block with pre-emption.
Know why we need pre-emption for interaction with the user (or clients
on a server).
Timeslice: how long a CPU-bound process can run before it loses the
CPU. Usually 1/1000 to 1/100 of a second.
Process Control Block (PCB): know what it stores.
Know what is a CPU-bound and an I/O-bound process.
Know the mechanics of a context switch.
Process scheduling: what are the main goals: fairness, efficiency,
response time, throughput, policy.
Know how all of these work: FCFS, static priority scheduling, dynamic
priority scheduling.
Know why starvation can occur, especially with static priority scheduling.
Know about the idle process.
 
9  Memory Management
Historical memory management schemes: bare machine, OS in ROM, partitions.
Partitions: base & limit registers, how they work to map a process'
address space to a part of physical memory.
Warren's concepts of logical and physical memory, and how they are
different.
Strategies to allocate partitions: first fit, best fit, worst fit,
why best fit is bad and worst fit is better.
Partitions suffer from external fragmentation, must copy to remove
the gaps.
Idea of pages: fixed sized lumps which are mapped to fixed size frame
of physical memory. 
Idea of address space: each process sees a different reality of the
layout of memory. Page mappings, offsets on each page that map to
the same offset of the frame.
Permissions on each page: what are they, what are they used for.
Page advantages: can share read-only pages, can share read-write pages
copy-on-write, but it costs a page fault exception when one process
touches the shared page, plus the copying. 
Page disadvantages: internal fragmentation, can only reduce with smaller
pages, but this costs in terms of context switching.
 
10  Virtual Memory, Disks
Paging: copying pages in/out of memory to disk. This is different
to a paged architecture.
We page out pages when we are running short of available memory to
allocate to process' memory requests.
Pages on the disk only have to be brought in when they are required,
so we can do lazy paging.
Idea of the working set: those pages that a process needs right now
to execute, this can be less than all the memory allocated to a process.
This is related to the locality of reference idea: over any short
interval, a process only uses a small fraction of its entire memory.
We need to keep the working set of all processes in memory, we can
choose to evict those pages not required, but we only do this when
we run short of memory.
Page replacement strategies: FIFO, NRU, Second Chance, LRU.
FIFO: choose the page allocated for the longest time to evict. Not
considered good, as this might be a heavily-used page.
NRU: classify all pages into 4 groups: A=0, M=0; A=0, M=1; A=1, M=0;
A=1, M=1. Choose a page from the lowest non-empty group.
Second Chance, a variation on FIFO. When a page is needed, look at
the oldest page. If it A bit is 0 (i.e it hasn't been accessed recently),
use it. Otherwise, set A to 0 and put it on the tail (as if it had
just been allocated). What this is doing is looking for a page that
has not been accessed in a long time.
LRU: considered the best strategy, but can be expensive to implement.
When a page is needed, pick the page unused for the longest time.
Impossible to do accurately, so it is always approximated. 
Paging can make a computer seem to have more physical memory than
it actually has, as allocated memory = RAM + pages on the disk. This
only works when we can fit the working set of all the processes in
physical memory.
If working set bigger than physical memory, we end up thrashing: moving
pages in and out to satisfy the current process memory accesses. Disks
are very slow, so thrashing slows down a computer terribly. Better
to buy more RAM to ensure enough RAM to hold all the working sets.
Disk: several platters with magnetic materials on each side. Several
arms move in/out together to reach a specific track.
All tracks at the same position on the platters are known as a cylinder.
Disks are physical devices. Seek time much bigger than rotation time
much bigger than transfer time.
We need to schedule the arm to avoid seeking as much as possible.
Several strategies:
FCFS: service the disk requests in the other that they are made. Leads
to lots of seeking so not good.
SSF: shortest seek first, after each seek, reorder the pending queue
to make the next seek the shortest. Better but can lead to starvation
of requests on the outermost/innermost tracks.
SCAN: also known as the elevator algorithm. When going from low to
high tracks, reorder the pending queue from low to high and do the
requests in order. When going from high to low, reorder the other
way. Improves seek times, and good property is that any requests only
has to wait 2 complete disk sweeps before it is serviced.
C-SCAN: SCAN but only service requests in ascending order. Makes the
algorithm simpler, and is also good because long seeks allow the arm
to accelerate, so the long sweep back from high to low also improves
seek times.
DMA: nothing to do with little green men. Tell the device controller
where to write data into memory, or where to read data from memory.
The device accesses memory across the busses like the CPU, which means
the CPU doesn't have to do any copying from the device's buffer for
I/O. This saves CPU time which can be used to do more user-mode instructions.
Device does the copy, and sends an interrupt when the I/O + copy is
completed.
 
11  File Systems and their Performance
A file: abstract idea, persistent storage for some data.
Files have contents and attributes: name, type, owner, permissions,
location in a directory, timestamps etc.
OS has to provide syscalls to access files: open, close, read, write,
create, remove, change attributes etc.
Directories are used to help group files logically. They only exist
to make life easier for humans.
A filesystem holds the data of the files, plus the metadata (attributes),
plus the directory structure.
We need to know: list of free blocks, list of blocks per file, file
metadata, directories. So a filesystem has an overhead, not all of
it can be used just to store file contents.
How to join attributes and file blocks:
Linux: name points to file attributes in an i-node, i-node points
to tree of disk blocks. We can have hard links: file with multiple
names, no name has preference over the other.
Windows: name and attributes are together as a directory entry, this
points to the first block in the FAT table.
Internal fragmentation: the last block of any file is not completely
full.
We want all the blocks of a file near each other, track-wise. We can't
use contiguous layout as this stops files from growing if they are
adjacent to other files. But non-contiguous layout can cause blocks
to be spread over the disk, requiring a defragmentation to bring them
back together.
How to store the free list of blocks. Could use a list, which starts
big but gets smaller as the disk is used up. List is big though. An
alternative is to use a bitmap, where each block is represented by
a bit, 0=free, 1=used. Bitmap is constant in size, usually much smaller
than a list.
FAT filesystem: a FAT is a table (array) with as many entries as blocks
on the disk. Each entry holds the number of the next block in a file,
or a special value meaning "no more blocks in the file", or
special values meaning free or bad.
A file is therefore represented as a singly-linked list of block numbers
in the FAT table. Good for going forward in the file, not good for
random access or going backward.
Historically, each entry was 12-bits long (max 1,024 blocks, good
for floppies), or 16-bit (65,536 blocks, OK for 30M disk drives),
now FAT-32 with 4 billion blocks, suits up to 2 terabyte filesystems.
System V Unix Filesystem: name separated from i-node which holds the
attributes. i-node holds list of first 10 blocks (good for small files).
After that, a block is allocated to hold the next 256 block numbers
in an array: a single-indirect block. After that, a double-indirect
block allocated to point to several single-indirect block. Ditto,
triple blocks. So Unix has a tree-structure to quickly find any disk
block used by the file, much better than a singly-linked list.
Link count of file names per i-node, so the file is deleted only when
all the filenames go away.
Problem of early Unix: all i-nodes placed on low track numbers, so
lots of seeking to get file attributes plus their contents.
Berkeley Fast Filesystem: place i-nodes across the disk, when starting
a new file choose an i-node close to the first disk block. Reduces
seek between them.
Lots of other advantages for the FFS: read the lecture notes!
We didn't spend time looking at file system reliability & performance.
Holey files: don't allocate blocks when their contents are all zeroes.
 
12  Synchronisation and Threads
Interprocess communication: we need this as processes often don't
act independently, so they need to be able to communicate.
Files can be used for IPC, but generally only when communication can
be time separated: write now, read later. Getting synchronised access
to a file at the same time with multiple processes is problematic.
Pipes: a buffer in the OS where one process writes into the buffer,
the other reads from it. Suitable for passing a stream of data between
processes. The OS can block each one if the pipe is empty or completely
full. Internet connections are effectively bi-directional pipes. Pipe
issues: a stream of data has no structure, so you need to insert markers
in the stream to mark separate sections.
Shared memory is the fastest IPC. Map pages shared read-write between
2 or more processes. A write by one process is seen immediately by
another process. Now we have to synchronise access to the shared data.
Race condition: a flaw in a system whereby the output or result is
unexpectedly and critically dependent on the sequence or timing of
other events (Wikipedia).
Critical section: a piece of code that accesses a shared resource
(data structure or device) that must not be concurrently accessed
by more than one thread of execution, otherwise it will lead to a
race condition (Wikipedia).
Make sure you understand why race conditions occur, and why a critical
section has to atomic to prevent race conditions.
To avoid a critical section, we need the following conditions:
-  No two processes can be inside the critical section simultaneously.
-  No assumptions are made about the relative speeds of each process
or the number of CPUs.
-  No process stopped outside its critical section should block another
process.
-  No process should wait arbitrarily long to enter the critical section.
Infinite timeslices: violates conditions 1 and 4.
Strict alternation: violates conditions 2 and 3, and requires busy
waiting.
Test and set lock instruction: good, but still requires busy waiting.
Semaphores: Dijkstra in 1965. Block a process trying to enter a critical
section already locked by another process. When the process leaves
the critical section, it unlocks it and the OS wakes up 1 blocked
process and allows it into the critical section.
Two semaphore primitives: acquire(semaphore) and release(semaphore).
Semaphores require programmers to remember to use them. Languages
can implicitly use them (monitors), so programmers don't have to remember.
The OS itself cannot use semaphores, as no section of the OS is a
process. The OS disables interrupts to ensure the CPU doesn't go off
elsewhere when it is inside a critical section. We have to trust the
OS programmers to get it right.
Threads: multiple execution units living inside an address space.
Each execution unit can be scheduled.
Understand why threads are useful: servers with multiple clients,
break up execution over several CPUs. One I forgot to mention is a
GUI process which has to keep the widgets on the window updated while
it also does some CPU-intensive work.
Each thread needs its own stack for its own stack frames. One problem
is how to keep the stack separate in the single address space. We
have to be careful not to call too many functions in a single thread,
or the thread's stack will collide with another thread's stack.
Also, now all the global data is shared, so we need semaphores etc.
to synchronise access.
Different thread models: user threads, kernel threads, lightweight
processes, mediumweight processes.
User threads: OS doesn't know about threads, just processes. A library
does the thread scheduling at user level: each thread has to co-operatively
give up the CPU so another thread in the process can run.
Kernel threads: the kernel does a lot of abstract stuff which isn't
directly related to the hardware, e.g. the filesystem. Run this code
as threads with kernel-mode permissions, means we can schedule it.
I didn't spend much time on this.
Lightweight processes: here the OS is thread-aware, and all threads
in a process are scheduled by the OS, and have full access to the
shared address space. Makes context switching between them faster.
Mediumweight processes: OS is thread-aware, but only the code and
global data pages are shared, each thread has its own stack page mappings.
This removes the potential stack collision problem, but makes context
switching a bit slower.
Heavyweight processes: these are normal processes, only the code is
shared read-only, and no read-write memory is shared.
File translated from
TEX
by 
TTH,
version 3.85.
On 31 Mar 2012, 12:36.