Suppose you have two strings of 8-bit bytes, and you'd like
to compare them lexicographically (left to right, byte by byte).
An oracle tells you the length of the strings, so maybe you have
3 ram
4 ramp
You can just do an strncmp on the two strings, using the minimum
of the two lengths (3 in the example). If they differ (they didn't),
you are done. If the strings are of the same length (they aren't),
they are equal. Otherwise, the shorter (a prefix of the longer)
compares low. Ho-hum.
Suppose each comparand is a sequence of such strings, and you
want to break ties on initial components of such sequences
using subsequent components (if any). But you have to combine
them as a single string and the oracle only tells you the total length.
You can't just concatenate them together, or
(3 ram, 4 part) => 7 rampart
(4 ramp, 3 art) => 7 rampart
(7 rampart) => 7 rampart
and they all look equal, but they're not supposed to be.
The problem is that some components are proper prefixes
of the corresponding component. We can sneak past the end
of one component and compare bytes from different components,
something that cannot be allowed. A collection of components
is said to have "the prefix property" if no component is
a proper prefix of any other component. If there is some
byte that cannot occur in some component, we can tack that
byte onto the component, and the components will then have
the prefix property. Newline terminated lines have the
prefix property, but we cannot just concatenate such
components and do an strncmp, because newlines compare low
to most bytes, but high to some legitimate bytes, like tabs,
so they don't break ties correctly.
Null terminators to the rescue! These induce the prefix property,
and compare low to everything else. Using blanks to stand in for
hard-to-parse null bytes, we get
(4 ram , 5 part ) => 9 ram part
(5 ramp , 4 art ) => 9 ramp art
(8 rampart ) => 8 rampart
The strncmp on the results establishes the desired order.
The null terminator on the final component is optional.
The oracular length determines how much of the component
is significant. But you have to be consistent. The final
component must always, or never, have the "optional" terminator.
Null-terminated strings (which cannot, themselves, contain a null)
are not the only components having the prefix property.
Fixed length strings cannot, by definition, include proper prefixes,
and they are free to contain any byte. UTF-8 code-points
have the prefix property (I suspect this was no accident),
so the null-terminated concatenation of non-null UTF-8
code-points have the prefix property and break ties appropriately
(assuming that the code-points themselves establish the
correct order for what is being compared).
I doubt that this explains the use of null terminated strings,
but for those of use who spend too much time thinking about sorting,
they sure work well.