On Jan 12, 2020, at 3:40 PM, Jon Steinhart
<jon(a)fourwinds.com> wrote:
Kevin Bowling writes:
I am regularly surprised by how surprising type systems are to
computing professionals.
I am currently astonished at this. Unfortunately, I need to make a hopefully
minor change to the linux kernel to support something that I want to do in my
current project. But, this is my first time looking at the internals which is
way different that my recollection of UNIX kernels. It's being enough of an
adventure that I'm writing up a travelogue of my journey through the code.
While I swore that I was done writing books this is sure looking like another
one :-)
So I came across this piece of what I consider to be bad programming that's
all over the place...
One of my programming style rules is to program in the language in which you're
programming. The canonical example of not doing this is the Bourne shell which
was originally written using macros to redefine C to look like Algol68.
Linux contains several sets of list_for_each_entry() macros that are essentially
obfuscated for loops that generate inefficient code. To make things worse, the
way that they're implemented is by embedding list_head structures into other
structures.
In the diagram below, the labels above boxes are structure names. Names inside
of boxes are structure member names. super_blocks, s_list, s_mounts, and
mnt_instance are all list_head structures. (Trick question, how many lines are
in this diagram :-) )
+-----------------------------------------+
| super_block |
| +--------------+ +----------+ |
+->| super_blocks |<->| s_list |<- ... -+
+--------------+ +----------+
| | mount mount
| ... | +--------------+ +--------------+
| | | ... | | ... |
+----------+ +--------------+ +--------------+
+->| s_mounts |<->| mnt_instance |<->| mnt_instance
|<- ... -+
| +----------+ +--------------+ +--------------+ |
| | ... | | ... | |
| +--------------+ +--------------+ |
+------------------------------------------------------------+
The bizarre thing to me is that the list_head structures point to other list_head
structures that are embedded in other structures. When one needs to access a
non list-head member of the structure one has to pass both the structure type and
the list_head member name to a macro that figures out how to subtract the offset
of the list_head member of the structure from the address of that list_head to
get the address of the structure, and then casts that as the structure type so
that members can be accessed.
There is similar code in FreeBSD kernel. Embedding head and next ptrs reduces
memory allocation and improves cache locality somewhat. Since C doesn't have
generics, they try to gain the same functionality with macros. See
https://github.com/freebsd/freebsd/blob/master/sys/sys/queue.h
Not that this is the same as what Linux does (which I haven't dug into) but
I suspect they may have had similar motivation.
Not sure that I understand.
A list_head structure takes exactly the same amount of space as a pair of pointers,
so the memory allocation should be identical.
I also don't see the cache locality. Both the embedded list_head and the start
of the structure are accessed in either case; once the programmer has played
guess-the-type the list macro returns a pointer to the start of the structure.
So again, I'm just commenting on what I'm seeing in linux, and while it is
possible that I may be misunderstanding something so far I don't think so.
Jon