[TUHS] Re: Bell Foreign-Language UNIX Efforts

20 Mar 2023

On Mon, Mar 20, 2023 at 4:48 PM Steffen Nurpmeso &lt;steffen(a)sdaoden.eu&gt; wrote:
However note that even something like "uppercase this string"
...
  cannot be done the right way, because a truly Unicode
aware
 operation needs to look at the entire string (sentence), because
 there may be interdependencies that modify the result. 
If you are talking about downcasing Greek Σ, then it's true that always
downcasing Σ to σ is inadequate.  Unicode specifies that if the Σ appears
before a space or punctuation mark, it downcases to ς instead.  But this is
not always correct.
For example, if the string "ΦΙΛΟΣ." is the word "φιλοσ" (meaning
'beloved'
or 'friend') at the end of a sentence, "φιλοσ." is the correct
downcasing.
But if it is the abbreviation for "φιλοσοφία", meaning "philosophy",
then
the correct downcasing is "φιλοσ."  So getting this right is an AI-complete
problem which neither Unicode nor ICU can solve.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Re: Bell Foreign-Language UNIX Efforts