On a distantly related note, when I worked part time for the MIT administration in the 60's, we'd do processing on the 7094 in the main comp center, then bring a tape of results back to our office to do the printing and card punching. For reasons that were never explained to me, the blocks of print and punch data were combined on a single tape, and were distinguished by being written with different parities. If you attempted to read a block with the correct parity setting, the tape could stream. If you guessed wrong, you had to stop, back up a block, change the parity, and try again. So I wrote a simple 360 assembly program to keep track how often blocks of each parity followed the observed parities of the previous 8 blocks. Essentially, 256 pairs of parity observations, indexed by the previous 8 parity observations. Blocks of print and punch data tended to fall into patterns that depended on what job was being run on the 7094, so detecting those patterns and correctly anticipating the upcoming parity made the tapes move much more smoothly. It was fun to watch the tape at the start of a run. It was mostly just a coin-toss, so the tape was jerking around fitfully. As the patterns started to emerge, the predictions got better, the jerking got less and less common, and the tapes were streaming most of the time. My introduction to learning algorithms.