2.9BSD/usr/net/sys/net/NOTES

Porting the UCB VAX network code to the PDP11 involves the following areas:

INSTRUCTION OVERLAYS

Partitioning the instruction space code into a number of 8K byte overlay
segments.  Currently I am using KLH's overlay scheme, just because it
doesnt involve compiler mods; for UCB distribution we will want
to use the UCB overlay scheme.  The partitioning could probably be more
optimized to avoid overlay switching (even though these are just resident
overlays).  Currently the overlays are organized along functional
groupings: interface drivers (if_xxx), ip/routing, tcp, user requests; this
could perhaps instead be sliced so that the most common (read/write)
functions plus the high-speed interface (if_il.c) are all in the same
overlay.

MBUF CLASSES

The VAX network code uses a pool of "mbuf"s for a number of different
classes of data: (a) dynamically created structures such as sockets, tcp
state blocks, routing table entries. (b) packet buffers. (c) packet buffers
with pointers threading thru them.

The PDP11 UNIX kernel does not have enough 16 bit addressable data space to
hold all of these items.  Typically the 8 memory management segments for
kernel data space (on an I/D machine) are broken down as follows:  SEG0-4:
kernel data structures, SEG5: movable pointer to external buffer pool,
SEG6: movable pointer to "_u" (user) structure of currently active process,
SEG7:  IO page.

STRUCTURE MBUFS

Mbuf packet buffers can indeed use SEG5 as a mechanism to access a large
external buffer pool.  However since these buffers are always mapped into
the same virtual address range (SEG5 = 0120000-0137777), they cannot be
used for dynamic structures and packets that point to other dynamic
structures and packets.  Instead a small (4K byte) area is permanently
resident in kernel D space and dynamically allocated and freed, using an
"malloc"-like scheme.  This provides enough structure space for
approximately 20 simultaneously active connections.

Where previously the code allocated an mbuf and setup a structure in it, it
now calls macros that use the appropriate pool if on the 11:

The old way structures were allocated and freed:
	struct inpcb *inp;
	struct mbuf *m;
	m = m_getclr(M_DONTWAIT);
	if (m == 0) goto bad;
	inp = mtod(m, struct inpcb *);
	...
	m_free(dtom(inp));

Using the "structure get" and "structure free" macros:
	struct inpcb *inp;
	MSGET(inp, struct inpcb, M_CLEAR);
	if (inp == 0) goto bad;
	...
	MSFREE(inp);

PACKET MBUFS

Some of the mbufs are simply used to pass packets between protocol levels;
such as between tcp_output and ip_output, or in a queue of mbufs handled by
"ipintr" (the ip_input software interrupt handler).  On the 11, the mbuf
headers are contained in the permanently mapped area; the mbuf header then
contains the click (mem management) address for the external buffer data.
Thus most of the VAX code remains unchanged.  Structures can contain mbuf
addresses and mbufs can be linked together as normal.  The "mtod" macro for
the 11 does the same as the VAX macro: it converts an mbuf address into a
pointer to an arbitrary structure using the m->m_off "offset" field in the
header.  Of course on the 11, this also involves diddling the mem
management register for SEG5, and the resulting pointer address is NOT
unique.  Therefore whereas the inverse operation, "dtom(inp)" (convert a
data to an mbuf address) is always well defined on the VAX, it is a risky
operation on the 11.  The 11 can only do this correctly if the
corresponding "mtod" occured previously and is still in effect.  Thus for
example "ip_reass(ip, fp)" and "ip_stripoptions(ip,mopt)" work alright
since the ip header refered to was recently mapped in using "mtod",
so that the "dtom"s inside ip_reass and ip_stripoptions will perform
properly.

At some time in the future one may want to go thru the code and
replace these cases with direct mbuf references (such as
ip_stripoptions(m,mopt)) to try and reduce or eliminate use of dtom.  At
least on the 11 this would result in more robust code.

PACKET MBUFS WITH THREADS

This is the hardest case to convert, although usually only a small
additional amount of code is needed.  For example in ip_reass, instead of
threading the ip packets together by reusing fields in the direct ip
header, a copy of the header is made in the permanently mapped area and the
header augmented to contain an mbuf pointer as well as pointers to other ip
(copy) headers.  In this region of code a small define was used to increase
portability of the dtom construct:
	#if pdp11
	#define DTOMQ (q->ipf_mbuf)
	#else
	#define DTOMQ dtom(q)
	#endif

MEMORY MANAGEMENT REGISTER CONSIDERATIONS

Most of the loading and restoring of the SEG5 register occurs automatically
as a result of the conditionalized "mtod" and "dtom" macros.  In certain
cases it is desired to save an existing map and restore it upon exit.
"MAPSAVE" and "MAPREST" macros (which are nops on the VAX), were setup for
this.

Another special case is in copying from one mbuf to another mbuf.  This
requires "borrowing" another memory management register on the 11.
Cases such as this were replaced with a general macro that works for VAX
or 11:  "MBCOPY(from_mbuf, from_offset, to_mbuf, to_offset, count);"

NEW SYSTEM CALL SUPPORT

System calls from 4.1a were added: gethostname, sethostname,
socket, connect, accept, send, receive, socketaddr, and select.
Select was the hardest as it required mods in the low level tty drivers
and a new ttynew.c canonical driver (ttyold.c no longer exists).
There is also a new pty.c (pseudo terminal) driver.

UNBUFFERED WRITES

Many PDP-11's are still running binaries using the original V7 stdio library
which defaults to unbuffered system write calls.  This creates an enormous
load when each byte output causes transmission of a network packet.
A mechanism from Purdue net was installed which buffers one byte writes
locally until either a timer expires or a length threshold is exceeded.

LONG VARIABLE AND DEFINE NAMES

The cpp from 4BSD was used which allows arbirary length defines.  Then
function, structure and variable names which were not 7 character unique
were remapped so that they were.  This avoids editing the source code and
allows staying in sync with new VAX versions.

UNSIGNED CHARS AND LONGS, FLOATS

Use of u_char and u_long must be examined on a case by case basis and
converted to the 11, which does not directly support these types.  A macro
"UCHAR(x)" was used for comparisons and assignments where the 11 sign
extension would normally screw up.  The macro is a nop on the VAX.

The exponential backoff algorithm in TCP used floats.  PDP11s sometimes
lack floating point hardware, and in any event the kernel would need
modifications to save and restore the floating point status.  Instead the
calculations were done in scaled integer arithmetic.

BYTE ORDERING AND ALIGNMENT

Long's on the 11 are short reversed to those on the VAX, so several macros
and functions needed changing.  All "host to net" and "net to host" code
that was conditionalized only for the vax, applies as well for the 11.
Also interestingly the struct in_addr union definitions are exactly the
same on the 11 as on the VAX, but the IN_NETOF and IN_LNAOF defines are
different, since these are affected by net byte order within a long.

Certain code in option processing assumes longs can lie on arbitrary byte
boundaries.  These assignments were changed to bcopy's.

PRINTF'S

Some %x's had to be changed to %X.

LONG ARGUMENT AND FUNCTION RETURN DEFAULTS

Argument and function type casts and declarations had to be more specific
for the 11, where the default is short.

OTHER POSSIBLE ORGANIZATIONS OF MEMORY

Another possibility considered for the organization of memory used by the
network code, was to move the entire network out into "user space" (but NOT
a user process).  On an I/D machine this could work quite well.  About 30K
bytes of I space would be used for instructions, leaving the other 30K
bytes available for expansion.  No instruction overlays would thus be
needed.  In D space SEG6 and SEG7 would point to the _u struct and IO page
(as normal in a kernel context).  This would allow easy argument passing
and device register access.  The remainder of D space (48K bytes) would
allow a pool of almost 200 mbufs, each 256 bytes long (twice the VAX mbuf
size, since the 11 doesnt support "cluster mbufs").  This would be a large
enough pool to allow the "mtod" and "dtom" constructs to work indentically
as the VAX, greatly simplifying the work necessary to port the code.

The system call interface routines (ipc.c) would reside in the kernel and
do "cross-address-space" subroutine calls to the net code in user space.
Since the stack is shared between the kernel and the net, this would be
relatively efficient.  There are about a dozen entry points each used by
the kernel calling the net and the net calling the kernel; these would be
intercepted to perform the cross-address-space manipulations required of
the memory management registers and PC.  Examples of kernel to net calls:
pffasttimeo, soioctl, soselect, net interface interrupts, socreate,
soaccept, etc.  Examples of net to kernel calls: sleep, psignal, iomove,
wakeup, selwakeup, printf.  There is also a small amount of socket data
that ipc.c routines try to access; this can be shared using argument and
return value exchanges with the net code.  Sleep arguments (net space
socket addresses) used by the net can be mapped to an unused portion of
kernel space (such as by ORing in the lower order bit).

Such a scheme however would not work well on a /23 or /34 class machine
since some elaborate user space SEG0-only trap-based text overlay system
would need to be developed to get enough space for the buffer pool.  At
least the chosen method has a possibility of working on a /23, and is
identical to the I/D method.

UNFINISHED BUSINESS

Routing entries for gateways (rtentry) work fine, but since they come out
of the permanently mapped area on the 11, there isnt room for too many of
them (more than 10 would be wasteful).  Therefore some type of special flag
needs to be added to the table that says: forward anything I dont know
about to address xxxx.  That way a VAX or some official gateway can worry
about the table space and update problems.

Pipes on the 11 still use the original scheme; to use the VAX socket pipe
protocol (pipes in mbufs instead of the block buffer cache) may require
more buffers than are practical on the 11 (that's one reason buffers are in
paged virtual memory on the VAX).  There are no user-visible effects
either way.

The tcp_debug array needs to be moved out of address space (it shrunk from
100 to 2 elements).

The 4K bytes of in address malloc space for dynamic structures is ok for an
I/D machine, but may be tight on a /34 or /23.  Not sure yet whether this
will squeeze in.  Could probably drop this size to 1 or 2K bytes, if light
activity.  The code also currently assumes a similar software interrupt
scheme as the VAX.  This can be simulated on a /23 with some small mods to
the mch.s process switch code.

If Interlan local net hardware is used, and some microcode mods made that
allow true low overhead scatter/gather IO, then the 11 can be almost as
efficient as the VAX in the way it performs device driver IO. (The VAX
plays tricks with the UBA map to make mbuf chains look like they are
contiguous).