On Mon, Mar 20, 2023 at 4:48 PM Steffen Nurpmeso <steffen(a)sdaoden.eu> wrote:
However note that even something like "uppercase this string"
cannot be done the right way, because a truly Unicode
aware
operation needs to look at the entire string (sentence), because
there may be interdependencies that modify the result.
If you are talking about downcasing Greek Σ, then it's true that always
downcasing Σ to σ is inadequate. Unicode specifies that if the Σ appears
before a space or punctuation mark, it downcases to ς instead. But this is
not always correct.
For example, if the string "ΦΙΛΟΣ." is the word "φιλοσ" (meaning
'beloved'
or 'friend') at the end of a sentence, "φιλοσ." is the correct
downcasing.
But if it is the abbreviation for "φιλοσοφία", meaning "philosophy",
then
the correct downcasing is "φιλοσ." So getting this right is an AI-complete
problem which neither Unicode nor ICU can solve.