> From: Warner Losh
>> On Mon, Nov 4, 2024 at 8:14PM Larry McVoy wrote:
>> The bmap implementations I saw were bit for bit identical, same code,
>> same variables, same style, same indentation. I'm 100% sure they were
>> not independent.
> They are different in 4.3BSD. They are different in 4.2BSD (but less
> different). The underlying filesystems are different on disk, so they
> routines have to be different.
That last sentence points out something important that people need to remember
in this discussion: in between 4.1 and 4.2 (technically, in 4.1B), BSD
switched to the BSD Fast File System, so I very much doubt that the low-level
(i.e. logical file block to disk block) file system code in anything after
4.1A looks much like the AT+T low-level file system code. (I have no idea how
the BSD code compares to the Linux file system code, but that's between the
Linux people, and Berkeley.)
Yes. The original unix code was copied and redone somewhat. It was still likely
a derivative work (which is why AT&T forced Berkeley to redo it for 4.4-lite), but
it was like 25% the same, 25% similar but functionally identical and 50% new for
UFS, but filesystem layout is file system layout and some similarities persisted.
3bsd added dtofsb() calls. 4bsd added code to make accessing the indirect blocks
more reliable and made writing directories more reliable. 4.1 was identical to 4bsd.
4.2 changed a lot. the 32V bmap was 112 lines long, while 4.2 was 196 lines with
the following diffstat:
1 file changed, 141 insertions(+), 57 deletions(-)
So by this measure, over half of the new function was new (though most of the
comments were still the same). It did have the same structure, but structure isn't
necessarily copyrightable since filesystem layout code will be similar between
filesystems that are write-in-place. Looking at the diff, there's one stretch of 15
lines that are identical, but otherwise there's changes (mostly additions) every
few lines. A substantial re-write. These days, most open source authors would
replace the copyright statement with their own for such an extensive rewrite
since the diff was over 2x the size of the original file (another very imperfect
measure). Though the comments remaining identical is troublesome because
they are the parts of the code that are the most creative and subject to the
most freedom while the for loops and such are largely dictated by the problem
or C language and customary style.
Between 4.2 and 4.3, the changes were around the edges of this function
though not in this function (I was remiss in not chasing down the bare diff
I did last night). By net.2 it was re-written again, moving most of the function
of bmap elsewhere, so that almost nothing remained from the original 32V
in the original bmap function (though a quick grep shows that parts did move
elsewhere). In net.2 it's back down to 75 lines. shorter even than in 32V (but
it's a bit deceptive since the code was elsewhere, though also largely reworked).
diff reports only lines '{', '}', '/*', '*/' and a few simple assignments
(bap = bp->b_un.b_daddr;) and function calls (brelse(bp)) being the same and
all the comments different / gone from this function. Though a fair number of
the diffs were due to changes in the "buffer cache" interface, some formatting
changes and some substitution of #defines (like NIADDR) for bare constants (3
in this case). These changes were also due to the role of bmap being reduced
and things like balloc being used to handle the details a fair bit differently. And
bits of balloc do resemble bits of the original bmap, but again the structure
had changed somewhat.
The numbers for my diffs and such are based on Krik's disks, but can also
be tested by looking at the links I posted earlier or downloading and extracting
the sources from the TUHS archive.
The bmap() function I've extracted from different versions:
Warner