Cross posting to simh - since much of this has been discussed in the last few days there also....

in for penny, in for pound ... here is the history ...  man ... I lived this and I'll need a strong drink later tonight after I write it all up.


On Tue, Jul 28, 2020 at 7:04 PM Will Senn <will.senn@gmail.com> wrote:
I recall having to do something with cont.a files, which are not present on these images. So, my questions is, does anyone know of or have an actual 2bsd tape/tape image?
cont.a is a tp-v6 and earlier ism. 

DECtape had a directory at the front of the tape (think superblock/ilist), but could do cool things and be treated more like a disk.
When tp was created for a very early version of Unix (I'm not sure which, could be V2), Ken/Dennis et al had DECtape units and so the original scheme followed the media.   This also meant that the program could write files and go back and update the directory, which is a no-no with many tape systems.  Then research got a 9-track unit.   So tp was changed to calculate how much space was going to be needed, write the directory, then the datablocks.  All good ... except...

9-track could write more files than the directory could take.   So for many years, people would use the ar(1) program to take a number of files in a directory, create a file called cont.a and then delete the files.  Then the tree would be written with tp, when you read it, you reversed the ar(1) process.  If you look at the USENIX/Harvard tape on the TUHS you'll see this scheme in use.  

BTW: tp was written in assembler and all the data structures were using PDP-11 binary formats.  Eventually, Harvard wrote stp (super-tp) in C  (which is on the USENIX tape Warren has in the archives) that worked like the original assembler tp but also put a redundant directory at the end of the tape.  Another issue with tp was if the you had a bad block in the first few blocks you could not decode the rest of the tape.  [There were some other issues with the UNIX tree structure as disks got bigger but I'm going to ignore all that - other than to say, tp had lived it life].

Enter Mashey and the PWB 1.0 folks (which is based on V6).  Someone in USG created cpio (and volcpy) as part of the PWB 1.0.   Like tp it was a PDP-11 binary format, but unlike tp, the tape directory is threaded.  i.e. block one describes the first file only and includes the size of the following file, then file itself, then a new directory block for the next file and again that file (rinse and repeat).  So it improved on tp in the directory threading, but was still binary and for a reasons I'll leave out had a different user interface.

As part of V7, Ken wrote a new program, tar [you can ask him why].  But like cpio he used a threaded tape directory, but unlike cpio it was always ASCII and not PDP-11 specific.  Furthermore, the user interface was made to parrot tp.  So, certainly, it had the advantage that changing tp scripts to use tar was pretty easy i.e. s/tp/tar/     not so for coil.  And it was muscle memory compliant.

For PWB 2.0, cpio was updated to allow a -c option to write the header in ascii and -s to byte swap the binary.   But the damage had been done ...

Thus began 'tar wars' which was a battle that raged officially over tape archive formats, but really was an argument about user interfaces.  Since tar was part of Research and the Universities and commercial people used it, only USG and the folks inside the Bell System were using cpio, as officially none of us had it since PWB was not released to us (although thanks to many AT&T employees doing an OYOC year, many schools like UCB, MIT and CMU all had the sources to cpio anyway -- for instance you'll see it hidden away on Kirk's CD).

I personally had used both, preferred tar for easy of use and ASCII directories.  But, note I had written car at Masscomp, but never tpio.  This was our trick to use the file scripting list that cpio could do, but create tar format tapes - which was handy.  I never wrote tpio which would have been cpio format using tp/tar user interface as I did not need it.

Roll forward to the /usr/group UNIX standard that Heinz chaired.  We ended up not being able to agree on a distribution format, but the ISVs were PO because now they could create UNIX programs that might actually work across systems, but they had not standard way to distribution.
Roll forward again to IEEE.  Heinz's committee was officially disbanded (story discussed elsewhere) and we were created as IEEE P1003 with Jim Issack as Chair. This time the ISV's said we had to have a distribution format.  Since *.1 was only an API we were allowed to avoid the user interface issue but only examine the on tape format.

It turns out while it seems to have been unintended, Ken's original V7 implementation has an interesting coding feature/bug which turns out to be what clinched the deal.   When Ken creates the directory block for each file, he did bcopy of 0's to the buffer before he wrote that data that fills it in.  Then when he calculated the checksum for the directory header block, he summed the entire block (which because of the bcopy was zeros).  This means if you write beyond the end of Ken's original header and include that extra data in the chksum, the original program will ignore the new information but accept the directory block as valid.  i.e. he had built an extension mechanism into the tar on-tape format.

cpio's ASCII on tape format was not able to do that as the checksum used a sizeof(header struct) in the checksum routine.

USTAR was born ... Ken had written things like the UID/GID as ASCII representations of the integer value in the original header.  USTAR added the ASCII representation of the username and the group name since that was more often portable between systems than the numbers.   There were other additions like more room for the pathname new file types etc.  But the key is that a USTAR tape can be read by the original V7 (and follow on) tape formats, although may not recognize all the filetype or use all of the new information.

A few years later during *.2 discussions, we finally got into the user interface stuff and pax(1) was born.  Knowing my hack with car, Keith Bostic, Jim McGuiness and I wrote up a description of a program that could with both users interfaces scheme.  USENIX provided funding for a student to implement it and put the sources out on comp.unix.sources at some point.  That proposal was originally accepted at the first tape user interface program in *.2 [a few years later after I stopped being part of the committee, the USG folks did get an alternate CPIO format accepted and cpio as an allowed program.   USENIX paid to have the program updated to operate like cpio if it was called that, pure V7 tar if called that and if pax, user USTAR].

'nuf said ... I hope.

Clem