John Cowan wrote in
<CAD2gp_TgTFL5agm8Z=immnGiMkpELL-wM_ZXos8OcKngw=2DLw(a)mail.gmail.com>:
|On Mon, Mar 20, 2023 at 4:48 PM Steffen Nurpmeso <steffen(a)sdaoden.eu> \
|wrote:
|
|However note that even something like "uppercase this string"
|> cannot be done the right way, because a truly Unicode aware
|> operation needs to look at the entire string (sentence), because
|> there may be interdependencies that modify the result.
|
|If you are talking about downcasing Greek Σ, then it's true that always
|downcasing Σ to σ is inadequate. Unicode specifies that if the Σ appears
|before a space or punctuation mark, it downcases to ς instead. But this is
|not always correct.
|
|For example, if the string "ΦΙΛΟΣ." is the word "φιλοσ" (meaning
'beloved'
|or 'friend') at the end of a sentence, "φιλοσ." is the correct
downcasing.
|But if it is the abbreviation for "φιλοσοφία", meaning "philosophy",
then
|the correct downcasing is "φιλοσ." So getting this right is an AI-complete
|problem which neither Unicode nor ICU can solve.
Oh, i'd wish i only would be able to speek/read/write (old) Greek.
Unfortunately, after English, i either had to go to another school
or choose in between French and Latin, (i would have given
everything for Chinese, Japanese, and/or Russian), so i had chosen
Latin. And whereas i started out as one of the three best, i then
watched an Interview with a CDU ("republican") state secretary,
with the wonderful Lea Rosh, and he talked Latin; and
whereas she repeatedly said "i understand you, but what is with
the audience?", you know, i as a young teenager, i was _so_ pissed
that "i quit", as like in the book "The Tin Drum" of Günter Grass.
So this made my grade point average a bit weaker.
But yes, i think quite a lot of languages have this problem. Even
my own native language German for the conversion of the lowercase
sharp-s, even though for over hundred years some try to establish
an uppercase variant, which the Swiss tongue has. (Mind you, even
after WWII when that uppercase ss was forbidden, at least in some
dosage forms, like that one used by the US rock band Kiss, ..not.)
If you would ask on the Unicode mailing-list, you will be told to
only convert entire sentences. But it seems Greek sigma is very
special, says Unicode FAQ.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)